You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Jeffrey Miller <je...@gmail.com> on 2019/06/14 20:03:30 UTC

Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Hi,
I have created a custom dictionary from the latest UMLS release with
SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating .script
file with unexpected differences as compared to the sno_rx_16ab file
available as part of the cTAKES release. Specifically, for diabetes, it is
missing these two rows:
INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')

and only has this one:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

The end result is that "diabetes" is not being picked up in the test text I
am running through- it requires the full 'diabetes mellitus'.

Is there any setting on the UMLS install side or the ctTAKES dictionary
creator that could account for missing alternative forms like this? I've
tried downloading the 2016AB release (which I think is the one used to
create the bundled sno_rx_16ab package?) and I am not getting the alternate
forms in that dictionary either.

Thanks,
Jeff

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

Oy, I need to better document this stuff ...  

> does not make it into the dictionary when Snomed is installed alongside it using Metamorphsys
-- This is indeed some strange behavior.  It seems that a bit of testing and debugging needs to be done.

>  is there a practical difference in the resulting cTAKES dictionary if you select the Source and Target 
-- https://cwiki.apache.org/confluence/display/CTAKES/Dictionary+Creator+GUI
-- There shouldn't be any difference in the synonyms used.  
-- Source indicates "use all the cuis that exist in this source, and get the synonyms for those cuis from all vocabularies."
-- Target indicates "write the original vocabulary-specific codes (e.g. snomed codes) to a table in the resulting database."
-- Selecting all vocabularies as "Source" would add more desired cuis (all of them" and therefore more synonyms; more synonyms altogether, not more synonyms per term (cui).

I hope that all makes sense.

Sean

________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Tuesday, June 25, 2019 10:11 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Hi Sean,

Thanks for the clarification, I think that help explains some of the
unexpected synonyms that appear in the sno_rx_16ab dictionary (for example,
DM for diabetes mellitus is coming in from another ontology (could be
MEDCIN) that was installed as part of UMLS, it was not manually added to
sno_rx_16ab). I suspect this confusion stems from people who only installed
the subset of UMLS they were interested in, like only installing snomed and
rxnorm using Metamorphsys. If you do that and compare the resulting cTAKES
dictionary to the sno_rx_16ab it will be missing many synonyms. I did
realize where the "diabete mellitus" was coming from- this is from the
Consumer Health Vocabulary (CHV, also part of UMLS), which intentionally
contains common misspellings and other term usages (see
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nlm.nih.gov_research_umls_sourcereleasedocs_current_CHV_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=sNkpafcJ-jvcpXHbXNeSvo5RHyYoQLCfIhSEsnXGQj4&s=aX23SI7aduuklj-_UbApldlYy1tpQOUIStUJc_buTj8&e= ). One
thing I noticed- there appears to be a reconciliation process when
processing synonyms from other ontologies in the dictionary creator. It
seems like it tries to reduce the number of synonyms for a term if there
seems to be coverage for the text span of one term with another in the same
CUI, but the result can sometimes be a little odd. For example, when you
choose snomed and rxnorm, but have other ontologies available for synonyms,
I think 'diabetes' (from another ontology, MEDCIN for one, but mapped to
the same CUI) ends up consuming "diabetes mellitus", so that term does not
actually appear (you can see this in sno_rx_16ab), but "diabete mellitus"
does persist (likely because diabetes is not a subset of that string).

grep -i "'diabetes mellitus'" sno_rx_16ab.script
INSERT INTO PREFTERM VALUES(11849,'Diabetes Mellitus')

There other examples of similar issues- for example, CUI 729346, "juvenile
osteochondrosis" is present in a dictionary if created with only snomed
installed, but if you also install CHV, it does not make it into the final
dictionary, only these do:

729346|2|3|osteochondropathy - juven|juven
729346|1|2|osteochondritis juvenilis|juvenilis
729346|1|2|juvenile osteochondritis|osteochondritis

A specific example that I have run into involves HPO alone versus a
dictionary created when Snomed was also available for synonyms. In that
case there are a few oddities that arise. For example, "severe short
stature", which is in the HPO, does not make it into the dictionary when
Snomed is installed alongside it using Metamorphsys, but is in there if HPO
alone is installed.

Out of curiosity, is there a practical difference in the resulting cTAKES
dictionary if you select the Source and Target column for a one ontology
(and nothing else), versus selecting the Source and Target columns for one
ontology and just the Source of all other ontologies installed? I know that
with the Source of all the ontologies checked, the ontology terms all end
up in the CUI_TERMS table, but since they aren't in the any target table,
would the effect be the same as leaving them unchecked (the synonyms of the
unchecked ontologies would be matched when running cTAKES if they were of
the same CUI as the selected ontology)?

Thanks,
Jeff

On Mon, Jun 24, 2019 at 10:58 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The dictionary creator uses the CUI set from selected sources, but
> synonyms from all available sources for CUIs in that set.
>
> I am not sure what is going on with the 's' in "diabetes".  A grep for
> "diabetes mellitus" and "diabete mellitus" in the umls mrconso file might
> have a hint.  Perhaps some code thinks that it is fixing a plural term?
>
> Sean
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Tuesday, June 18, 2019 10:23 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks Sean. I actually think I figured out what is causing the difference.
> When I create the UMLS install on my machine, I only install RxNorm and
> SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
> two sources on the left. I noticed in the screenshots on the wiki page for
> the dictionary creator GUI that many sources were installed, but only
> SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
> active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
> dictionaryCreator GUI) and it made a difference as to which terms appeared
> in the final cTAKES dictionary. As an example, I now get the "DM" entry for
> diabetes. I don't know why this should make a difference, but it appears
> that it does.
>
> Another odd observation related to this. In the sno_rx_2016ab file, I
> noticed there seems to be an error:
> INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')
>
> The 's' is missing from diabetes. When I created my dictionary (from the
> restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
> that term is correct:
> INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
>
> When I created the dictionary from the full cTAKES install tonight, that
> error appeared again.
>
> Jeff
>
>
>
> On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> > ago I can't swear to any of those filter sets being exactly what was
> used.
> >
> > I think that the key to working with any project is to check the
> > dictionary against a project's needs.  Fill in the gaps by either editing
> > the sql (.script) file or by adding a second dictionary.  In smaller
> > "focus" projects I usually end up augmenting the default dictionary with
> a
> > small custom bsv dictionary to catch any known synonyms or terms that
> > aren't represented in the default.  In projects requiring larger nets I
> > have built dictionaries that are horribly inclusive - 2 to 3 times the
> > sno_rx_16ab.
> >
> > Sean
> > ________________________________________
> > From: Jeffrey Miller <je...@gmail.com>
> > Sent: Monday, June 17, 2019 4:39 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > Thanks for following up Sean. I've looked into the links you sent along.
> > There are different groups of filters and it appears that the
> > dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> > directory. I don't think this is the set of filters used to make
> > sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> > veterinary product.  310367) in "UnwantedTexts.txt", but the
> > sno_rx_16ab.script file has "today" still in there. If you create a
> > dictionary with the dictionary builder, it does not include that term.
> >
> > I thought maybe the set of files under the "default" filter directory
> might
> > be the one used for the sno_rx_16ab package so I recompiled the
> > dictionaryCreator GUI to use the "default" filter files and created a new
> > snomed rxnorm dictionary from the 2016ab umls release, but the output is
> > still quite different that the packaged sno_rx_16ab dictionary. From
> > looking at diffs, it looks like there are a substantial number of
> additions
> > to the sno_rx_16ab, so much so that I really must be missing something.
> For
> > example, for CUI 12169 which describes a low sodium diet, there are about
> > 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> > dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> > groups).
> >
> > On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > Thanks for the clarifications, Sean. That was very enlightening. I look
> > > forward to the documentation (even if it entails some suffering on your
> > > part.)
> > >
> > > If/when you stumble on some idle time allowing you to implement the
> > manual
> > > edit panel, it would be nice to have it allow for re-partitioning the
> > > ontology. As you are very aware, UMLS CUIs and SNOMED do not always
> have
> > a
> > > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> > or
> > > a SNOMED being mapped to several CUIs.
> > >
> > > In some cases, clinicians don't agree with that partitioning in
> > specialized
> > > contexts and the inheritance that ensues and would like to re-assign
> > them.
> > >
> > > Not holding my breath, but just something to keep in mind.
> > >
> > >       Remy
> > >
> > > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > >1) ...
> > > > There are several collections of filter sets here:
> > > >
> > ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> > > >
> > > > 2) ...
> > > > There is additional logic within the dictionary creator code:
> > > > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> > > >
> > > > I haven't gone through it in a really long time, and without doing so
> > now
> > > > I can't enumerate the filters.  I have family visiting, otherwise my
> > > > curiosity would force me to do so and get back to you.   Honestly, it
> > > > should be documented somewhere, but writing (especially technical) is
> > > > pretty much my least favorite activity.
> > > >
> > > > Sean
> > > >
> > > >
> > > > p.s.
> > > > Please don't wait for it, but I am currently working on new
> dictionary
> > > > code and plan to introduce that in ctakes.  Again, please don't wait
> > for
> > > it
> > > > as it is mixed in with other work and will not be available for
> several
> > > > months (if at all).
> > > >
> > > >
> > > > ________________________________________
> > > > From: Jeffrey Miller <je...@gmail.com>
> > > > Sent: Sunday, June 16, 2019 9:49 AM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> and
> > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > >
> > > > Hi Sean,
> > > >
> > > > Thanks for your response. I had two follow-up questions that would be
> > > very
> > > > helpful to understand if you have a few moments:
> > > >
> > > > 1) Are the specific filters used in the official sno_rx_16ab codified
> > > > anywhere so that I could reproduce them?
> > > >
> > > > 2) Do these filters explain all the changes? For example, when I use
> > the
> > > > dictionary creator to export sno_med and rx_norm, I only get
> "diabetes
> > > > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > > > Especially with the addition of "dm" it feels like I must be missing
> a
> > > step
> > > > or a setting somewhere.
> > > >
> > > > Thanks!
> > > > Jeff
> > > >
> > > > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > > > Sean.Finan@childrens.harvard.edu> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The contents of the sno_rx_16ab are a dump of the umls 2016AB
> snomed
> > > and
> > > > > rxnorm terms with certain symantic types.  Nothing was added, but
> > > > synonyms
> > > > > are filtered based upon various rules.  For instance, unnecessary
> > > > suffixes
> > > > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> > > excluded
> > > > > ("can walk straight line with only minimal assistance"), terms with
> > > dose
> > > > or
> > > > > form are ignored and so forth.
> > > > >
> > > > > Some filters can be changed by adding/removing from
> > > > prefix/suffix/contains
> > > > > lists in plaintext files or by modifying the dictionary creator
> code.
> > > > >
> > > > > There was no manual curation (or nothing major).  As Remy mentioned
> > > that
> > > > > requires a lot of attention and time.  The dictionary database was
> > not
> > > > > intended to be perfect, just as good as possible without major
> > > > investment -
> > > > > and reproducible with updates to the umls.
> > > > >
> > > > > As the dictionary is released as a sql database, you should be able
> > to
> > > > add
> > > > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > > > "manual
> > > > > edit" panel to the dictionary gui, but haven't had the time.  If
> > > anybody
> > > > > else would like to work on such a tool that would be tonic.
> > > > >
> > > > > Sean
> > > > >
> > > > >
> > > > > ________________________________________
> > > > > From: Harish Kulkarni <ha...@gmail.com>
> > > > > Sent: Saturday, June 15, 2019 5:16 PM
> > > > > To: dev@ctakes.apache.org
> > > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> > and
> > > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > > >
> > > > > unsubscribe
> > > > >
> > > > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <
> > remys@foreseemed.com>
> > > > > wrote:
> > > > >
> > > > > > Yes, I agree it would be nice because the tokenization that
> occurs
> > > when
> > > > > > creating the dictionaries from the releases make comparisons a
> bit
> > > > tricky
> > > > > > and is not 100% reversible. I would love to hear an answer to
> your
> > > > > > quandary.
> > > > > >
> > > > > >      Remy
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks, I was curious if the cTAKES devs that created the
> > > sno_rx_16ab
> > > > > > > dictionary had put the differences applied to the default UMLS
> > > output
> > > > > > into
> > > > > > > version control in some form. I imagine the
> > > > > > > additions/synonyms/abbreviations that were added manually must
> > have
> > > > > been
> > > > > > > collected over time somewhere prior to merging them with 2016ab
> > > UMLS
> > > > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > > > release
> > > > > > with
> > > > > > > an additional ontology and the latest terms. I can likely come
> up
> > > > with
> > > > > a
> > > > > > > diff myself but was wondering if this was already maintained as
> > > part
> > > > of
> > > > > > > cTAKES.
> > > > > > >
> > > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > > > remys@foreseemed.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Yes, that's pretty much what we do too. Not only to enhance
> the
> > > > > > > dictionary,
> > > > > > > > but to put in corrections because, lo and behold, there are
> > some
> > > > > errors
> > > > > > > in
> > > > > > > > there!. As you know, an ontology is a constant curation job
> and
> > > > that
> > > > > > > > script, under SCM, allows you to isolate those changes and,
> if
> > > > > > necessary,
> > > > > > > > re-apply them to new versions.
> > > > > > > >
> > > > > > > >       Remy
> > > > > > > >
> > > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > > > gandhirajan.n@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jeff,
> > > > > > > > >
> > > > > > > > > As far as I know, maintaining a separate SQL script to add
> > > > > additional
> > > > > > > > > entries should work seamlessly.
> > > > > > > > >
> > > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > > > modifications/synonyms are tracked anywhere (aside from
> the
> > > > > > > dictionary
> > > > > > > > > > itself) so they can be carried forward in future
> dictionary
> > > > > > updates?
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > > > remys@foreseemed.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > From my experience, it seems pretty obvious that
> > > sno_rx_16ab
> > > > > is a
> > > > > > > > > curated
> > > > > > > > > > > dictionary based on the SNOMED 2016AB release. It does
> > not
> > > > > > contain
> > > > > > > > the
> > > > > > > > > > full
> > > > > > > > > > > set but it has additional edits and synonyms that are
> > > pretty
> > > > > > useful
> > > > > > > > > > > (including 'dm').
> > > > > > > > > > >
> > > > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > > > >
> > > > > > > > > > >       Remy
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > > > jeffmax@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > I have created a custom dictionary from the latest
> UMLS
> > > > > release
> > > > > > > > with
> > > > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to
> be
> > > > > > > generating
> > > > > > > > > > > .script
> > > > > > > > > > > > file with unexpected differences as compared to the
> > > > > sno_rx_16ab
> > > > > > > > file
> > > > > > > > > > > > available as part of the cTAKES release.
> Specifically,
> > > for
> > > > > > > > diabetes,
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > missing these two rows:
> > > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > > > INSERT INTO CUI_TERMS
> > > > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > > > >
> > > > > > > > > > > > and only has this one:
> > > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > > > mellitus','mellitus')
> > > > > > > > > > > >
> > > > > > > > > > > > The end result is that "diabetes" is not being picked
> > up
> > > in
> > > > > the
> > > > > > > > test
> > > > > > > > > > > text I
> > > > > > > > > > > > am running through- it requires the full 'diabetes
> > > > mellitus'.
> > > > > > > > > > > >
> > > > > > > > > > > > Is there any setting on the UMLS install side or the
> > > > ctTAKES
> > > > > > > > > dictionary
> > > > > > > > > > > > creator that could account for missing alternative
> > forms
> > > > like
> > > > > > > this?
> > > > > > > > > > I've
> > > > > > > > > > > > tried downloading the 2016AB release (which I think
> is
> > > the
> > > > > one
> > > > > > > used
> > > > > > > > > to
> > > > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > > > getting
> > > > > > the
> > > > > > > > > > > alternate
> > > > > > > > > > > > forms in that dictionary either.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Jeff
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > > Gandhi
> > > > > > > > >
> > > > > > > > > "The best way to find urself is to lose urself in the
> service
> > > of
> > > > > > others
> > > > > > > > > !!!"
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Hi Sean,

Thanks for the clarification, I think that help explains some of the
unexpected synonyms that appear in the sno_rx_16ab dictionary (for example,
DM for diabetes mellitus is coming in from another ontology (could be
MEDCIN) that was installed as part of UMLS, it was not manually added to
sno_rx_16ab). I suspect this confusion stems from people who only installed
the subset of UMLS they were interested in, like only installing snomed and
rxnorm using Metamorphsys. If you do that and compare the resulting cTAKES
dictionary to the sno_rx_16ab it will be missing many synonyms. I did
realize where the "diabete mellitus" was coming from- this is from the
Consumer Health Vocabulary (CHV, also part of UMLS), which intentionally
contains common misspellings and other term usages (see
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/CHV/). One
thing I noticed- there appears to be a reconciliation process when
processing synonyms from other ontologies in the dictionary creator. It
seems like it tries to reduce the number of synonyms for a term if there
seems to be coverage for the text span of one term with another in the same
CUI, but the result can sometimes be a little odd. For example, when you
choose snomed and rxnorm, but have other ontologies available for synonyms,
I think 'diabetes' (from another ontology, MEDCIN for one, but mapped to
the same CUI) ends up consuming "diabetes mellitus", so that term does not
actually appear (you can see this in sno_rx_16ab), but "diabete mellitus"
does persist (likely because diabetes is not a subset of that string).

grep -i "'diabetes mellitus'" sno_rx_16ab.script
INSERT INTO PREFTERM VALUES(11849,'Diabetes Mellitus')

There other examples of similar issues- for example, CUI 729346, "juvenile
osteochondrosis" is present in a dictionary if created with only snomed
installed, but if you also install CHV, it does not make it into the final
dictionary, only these do:

729346|2|3|osteochondropathy - juven|juven
729346|1|2|osteochondritis juvenilis|juvenilis
729346|1|2|juvenile osteochondritis|osteochondritis

A specific example that I have run into involves HPO alone versus a
dictionary created when Snomed was also available for synonyms. In that
case there are a few oddities that arise. For example, "severe short
stature", which is in the HPO, does not make it into the dictionary when
Snomed is installed alongside it using Metamorphsys, but is in there if HPO
alone is installed.

Out of curiosity, is there a practical difference in the resulting cTAKES
dictionary if you select the Source and Target column for a one ontology
(and nothing else), versus selecting the Source and Target columns for one
ontology and just the Source of all other ontologies installed? I know that
with the Source of all the ontologies checked, the ontology terms all end
up in the CUI_TERMS table, but since they aren't in the any target table,
would the effect be the same as leaving them unchecked (the synonyms of the
unchecked ontologies would be matched when running cTAKES if they were of
the same CUI as the selected ontology)?

Thanks,
Jeff

On Mon, Jun 24, 2019 at 10:58 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The dictionary creator uses the CUI set from selected sources, but
> synonyms from all available sources for CUIs in that set.
>
> I am not sure what is going on with the 's' in "diabetes".  A grep for
> "diabetes mellitus" and "diabete mellitus" in the umls mrconso file might
> have a hint.  Perhaps some code thinks that it is fixing a plural term?
>
> Sean
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Tuesday, June 18, 2019 10:23 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks Sean. I actually think I figured out what is causing the difference.
> When I create the UMLS install on my machine, I only install RxNorm and
> SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
> two sources on the left. I noticed in the screenshots on the wiki page for
> the dictionary creator GUI that many sources were installed, but only
> SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
> active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
> dictionaryCreator GUI) and it made a difference as to which terms appeared
> in the final cTAKES dictionary. As an example, I now get the "DM" entry for
> diabetes. I don't know why this should make a difference, but it appears
> that it does.
>
> Another odd observation related to this. In the sno_rx_2016ab file, I
> noticed there seems to be an error:
> INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')
>
> The 's' is missing from diabetes. When I created my dictionary (from the
> restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
> that term is correct:
> INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
>
> When I created the dictionary from the full cTAKES install tonight, that
> error appeared again.
>
> Jeff
>
>
>
> On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> > ago I can't swear to any of those filter sets being exactly what was
> used.
> >
> > I think that the key to working with any project is to check the
> > dictionary against a project's needs.  Fill in the gaps by either editing
> > the sql (.script) file or by adding a second dictionary.  In smaller
> > "focus" projects I usually end up augmenting the default dictionary with
> a
> > small custom bsv dictionary to catch any known synonyms or terms that
> > aren't represented in the default.  In projects requiring larger nets I
> > have built dictionaries that are horribly inclusive - 2 to 3 times the
> > sno_rx_16ab.
> >
> > Sean
> > ________________________________________
> > From: Jeffrey Miller <je...@gmail.com>
> > Sent: Monday, June 17, 2019 4:39 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > Thanks for following up Sean. I've looked into the links you sent along.
> > There are different groups of filters and it appears that the
> > dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> > directory. I don't think this is the set of filters used to make
> > sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> > veterinary product.  310367) in "UnwantedTexts.txt", but the
> > sno_rx_16ab.script file has "today" still in there. If you create a
> > dictionary with the dictionary builder, it does not include that term.
> >
> > I thought maybe the set of files under the "default" filter directory
> might
> > be the one used for the sno_rx_16ab package so I recompiled the
> > dictionaryCreator GUI to use the "default" filter files and created a new
> > snomed rxnorm dictionary from the 2016ab umls release, but the output is
> > still quite different that the packaged sno_rx_16ab dictionary. From
> > looking at diffs, it looks like there are a substantial number of
> additions
> > to the sno_rx_16ab, so much so that I really must be missing something.
> For
> > example, for CUI 12169 which describes a low sodium diet, there are about
> > 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> > dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> > groups).
> >
> > On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > Thanks for the clarifications, Sean. That was very enlightening. I look
> > > forward to the documentation (even if it entails some suffering on your
> > > part.)
> > >
> > > If/when you stumble on some idle time allowing you to implement the
> > manual
> > > edit panel, it would be nice to have it allow for re-partitioning the
> > > ontology. As you are very aware, UMLS CUIs and SNOMED do not always
> have
> > a
> > > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> > or
> > > a SNOMED being mapped to several CUIs.
> > >
> > > In some cases, clinicians don't agree with that partitioning in
> > specialized
> > > contexts and the inheritance that ensues and would like to re-assign
> > them.
> > >
> > > Not holding my breath, but just something to keep in mind.
> > >
> > >       Remy
> > >
> > > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > >1) ...
> > > > There are several collections of filter sets here:
> > > >
> > ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> > > >
> > > > 2) ...
> > > > There is additional logic within the dictionary creator code:
> > > > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> > > >
> > > > I haven't gone through it in a really long time, and without doing so
> > now
> > > > I can't enumerate the filters.  I have family visiting, otherwise my
> > > > curiosity would force me to do so and get back to you.   Honestly, it
> > > > should be documented somewhere, but writing (especially technical) is
> > > > pretty much my least favorite activity.
> > > >
> > > > Sean
> > > >
> > > >
> > > > p.s.
> > > > Please don't wait for it, but I am currently working on new
> dictionary
> > > > code and plan to introduce that in ctakes.  Again, please don't wait
> > for
> > > it
> > > > as it is mixed in with other work and will not be available for
> several
> > > > months (if at all).
> > > >
> > > >
> > > > ________________________________________
> > > > From: Jeffrey Miller <je...@gmail.com>
> > > > Sent: Sunday, June 16, 2019 9:49 AM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> and
> > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > >
> > > > Hi Sean,
> > > >
> > > > Thanks for your response. I had two follow-up questions that would be
> > > very
> > > > helpful to understand if you have a few moments:
> > > >
> > > > 1) Are the specific filters used in the official sno_rx_16ab codified
> > > > anywhere so that I could reproduce them?
> > > >
> > > > 2) Do these filters explain all the changes? For example, when I use
> > the
> > > > dictionary creator to export sno_med and rx_norm, I only get
> "diabetes
> > > > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > > > Especially with the addition of "dm" it feels like I must be missing
> a
> > > step
> > > > or a setting somewhere.
> > > >
> > > > Thanks!
> > > > Jeff
> > > >
> > > > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > > > Sean.Finan@childrens.harvard.edu> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The contents of the sno_rx_16ab are a dump of the umls 2016AB
> snomed
> > > and
> > > > > rxnorm terms with certain symantic types.  Nothing was added, but
> > > > synonyms
> > > > > are filtered based upon various rules.  For instance, unnecessary
> > > > suffixes
> > > > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> > > excluded
> > > > > ("can walk straight line with only minimal assistance"), terms with
> > > dose
> > > > or
> > > > > form are ignored and so forth.
> > > > >
> > > > > Some filters can be changed by adding/removing from
> > > > prefix/suffix/contains
> > > > > lists in plaintext files or by modifying the dictionary creator
> code.
> > > > >
> > > > > There was no manual curation (or nothing major).  As Remy mentioned
> > > that
> > > > > requires a lot of attention and time.  The dictionary database was
> > not
> > > > > intended to be perfect, just as good as possible without major
> > > > investment -
> > > > > and reproducible with updates to the umls.
> > > > >
> > > > > As the dictionary is released as a sql database, you should be able
> > to
> > > > add
> > > > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > > > "manual
> > > > > edit" panel to the dictionary gui, but haven't had the time.  If
> > > anybody
> > > > > else would like to work on such a tool that would be tonic.
> > > > >
> > > > > Sean
> > > > >
> > > > >
> > > > > ________________________________________
> > > > > From: Harish Kulkarni <ha...@gmail.com>
> > > > > Sent: Saturday, June 15, 2019 5:16 PM
> > > > > To: dev@ctakes.apache.org
> > > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> > and
> > > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > > >
> > > > > unsubscribe
> > > > >
> > > > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <
> > remys@foreseemed.com>
> > > > > wrote:
> > > > >
> > > > > > Yes, I agree it would be nice because the tokenization that
> occurs
> > > when
> > > > > > creating the dictionaries from the releases make comparisons a
> bit
> > > > tricky
> > > > > > and is not 100% reversible. I would love to hear an answer to
> your
> > > > > > quandary.
> > > > > >
> > > > > >      Remy
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Thanks, I was curious if the cTAKES devs that created the
> > > sno_rx_16ab
> > > > > > > dictionary had put the differences applied to the default UMLS
> > > output
> > > > > > into
> > > > > > > version control in some form. I imagine the
> > > > > > > additions/synonyms/abbreviations that were added manually must
> > have
> > > > > been
> > > > > > > collected over time somewhere prior to merging them with 2016ab
> > > UMLS
> > > > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > > > release
> > > > > > with
> > > > > > > an additional ontology and the latest terms. I can likely come
> up
> > > > with
> > > > > a
> > > > > > > diff myself but was wondering if this was already maintained as
> > > part
> > > > of
> > > > > > > cTAKES.
> > > > > > >
> > > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > > > remys@foreseemed.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Yes, that's pretty much what we do too. Not only to enhance
> the
> > > > > > > dictionary,
> > > > > > > > but to put in corrections because, lo and behold, there are
> > some
> > > > > errors
> > > > > > > in
> > > > > > > > there!. As you know, an ontology is a constant curation job
> and
> > > > that
> > > > > > > > script, under SCM, allows you to isolate those changes and,
> if
> > > > > > necessary,
> > > > > > > > re-apply them to new versions.
> > > > > > > >
> > > > > > > >       Remy
> > > > > > > >
> > > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > > > gandhirajan.n@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jeff,
> > > > > > > > >
> > > > > > > > > As far as I know, maintaining a separate SQL script to add
> > > > > additional
> > > > > > > > > entries should work seamlessly.
> > > > > > > > >
> > > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > > > modifications/synonyms are tracked anywhere (aside from
> the
> > > > > > > dictionary
> > > > > > > > > > itself) so they can be carried forward in future
> dictionary
> > > > > > updates?
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > > > remys@foreseemed.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > From my experience, it seems pretty obvious that
> > > sno_rx_16ab
> > > > > is a
> > > > > > > > > curated
> > > > > > > > > > > dictionary based on the SNOMED 2016AB release. It does
> > not
> > > > > > contain
> > > > > > > > the
> > > > > > > > > > full
> > > > > > > > > > > set but it has additional edits and synonyms that are
> > > pretty
> > > > > > useful
> > > > > > > > > > > (including 'dm').
> > > > > > > > > > >
> > > > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > > > >
> > > > > > > > > > >       Remy
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > > > jeffmax@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > > I have created a custom dictionary from the latest
> UMLS
> > > > > release
> > > > > > > > with
> > > > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to
> be
> > > > > > > generating
> > > > > > > > > > > .script
> > > > > > > > > > > > file with unexpected differences as compared to the
> > > > > sno_rx_16ab
> > > > > > > > file
> > > > > > > > > > > > available as part of the cTAKES release.
> Specifically,
> > > for
> > > > > > > > diabetes,
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > missing these two rows:
> > > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > > > INSERT INTO CUI_TERMS
> > > > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > > > >
> > > > > > > > > > > > and only has this one:
> > > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > > > mellitus','mellitus')
> > > > > > > > > > > >
> > > > > > > > > > > > The end result is that "diabetes" is not being picked
> > up
> > > in
> > > > > the
> > > > > > > > test
> > > > > > > > > > > text I
> > > > > > > > > > > > am running through- it requires the full 'diabetes
> > > > mellitus'.
> > > > > > > > > > > >
> > > > > > > > > > > > Is there any setting on the UMLS install side or the
> > > > ctTAKES
> > > > > > > > > dictionary
> > > > > > > > > > > > creator that could account for missing alternative
> > forms
> > > > like
> > > > > > > this?
> > > > > > > > > > I've
> > > > > > > > > > > > tried downloading the 2016AB release (which I think
> is
> > > the
> > > > > one
> > > > > > > used
> > > > > > > > > to
> > > > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > > > getting
> > > > > > the
> > > > > > > > > > > alternate
> > > > > > > > > > > > forms in that dictionary either.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Jeff
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Regards,
> > > > > > > > > Gandhi
> > > > > > > > >
> > > > > > > > > "The best way to find urself is to lose urself in the
> service
> > > of
> > > > > > others
> > > > > > > > > !!!"
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

The dictionary creator uses the CUI set from selected sources, but synonyms from all available sources for CUIs in that set.

I am not sure what is going on with the 's' in "diabetes".  A grep for "diabetes mellitus" and "diabete mellitus" in the umls mrconso file might have a hint.  Perhaps some code thinks that it is fixing a plural term?

Sean
________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Tuesday, June 18, 2019 10:23 PM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Thanks Sean. I actually think I figured out what is causing the difference.
When I create the UMLS install on my machine, I only install RxNorm and
SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
two sources on the left. I noticed in the screenshots on the wiki page for
the dictionary creator GUI that many sources were installed, but only
SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
dictionaryCreator GUI) and it made a difference as to which terms appeared
in the final cTAKES dictionary. As an example, I now get the "DM" entry for
diabetes. I don't know why this should make a difference, but it appears
that it does.

Another odd observation related to this. In the sno_rx_2016ab file, I
noticed there seems to be an error:
INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')

The 's' is missing from diabetes. When I created my dictionary (from the
restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
that term is correct:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

When I created the dictionary from the full cTAKES install tonight, that
error appeared again.

Jeff



On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> ago I can't swear to any of those filter sets being exactly what was used.
>
> I think that the key to working with any project is to check the
> dictionary against a project's needs.  Fill in the gaps by either editing
> the sql (.script) file or by adding a second dictionary.  In smaller
> "focus" projects I usually end up augmenting the default dictionary with a
> small custom bsv dictionary to catch any known synonyms or terms that
> aren't represented in the default.  In projects requiring larger nets I
> have built dictionaries that are horribly inclusive - 2 to 3 times the
> sno_rx_16ab.
>
> Sean
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Monday, June 17, 2019 4:39 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks for following up Sean. I've looked into the links you sent along.
> There are different groups of filters and it appears that the
> dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> directory. I don't think this is the set of filters used to make
> sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> veterinary product.  310367) in "UnwantedTexts.txt", but the
> sno_rx_16ab.script file has "today" still in there. If you create a
> dictionary with the dictionary builder, it does not include that term.
>
> I thought maybe the set of files under the "default" filter directory might
> be the one used for the sno_rx_16ab package so I recompiled the
> dictionaryCreator GUI to use the "default" filter files and created a new
> snomed rxnorm dictionary from the 2016ab umls release, but the output is
> still quite different that the packaged sno_rx_16ab dictionary. From
> looking at diffs, it looks like there are a substantial number of additions
> to the sno_rx_16ab, so much so that I really must be missing something. For
> example, for CUI 12169 which describes a low sodium diet, there are about
> 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> groups).
>
> On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Thanks for the clarifications, Sean. That was very enlightening. I look
> > forward to the documentation (even if it entails some suffering on your
> > part.)
> >
> > If/when you stumble on some idle time allowing you to implement the
> manual
> > edit panel, it would be nice to have it allow for re-partitioning the
> > ontology. As you are very aware, UMLS CUIs and SNOMED do not always have
> a
> > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> or
> > a SNOMED being mapped to several CUIs.
> >
> > In some cases, clinicians don't agree with that partitioning in
> specialized
> > contexts and the inheritance that ensues and would like to re-assign
> them.
> >
> > Not holding my breath, but just something to keep in mind.
> >
> >       Remy
> >
> > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > >1) ...
> > > There are several collections of filter sets here:
> > >
> ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> > >
> > > 2) ...
> > > There is additional logic within the dictionary creator code:
> > > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> > >
> > > I haven't gone through it in a really long time, and without doing so
> now
> > > I can't enumerate the filters.  I have family visiting, otherwise my
> > > curiosity would force me to do so and get back to you.   Honestly, it
> > > should be documented somewhere, but writing (especially technical) is
> > > pretty much my least favorite activity.
> > >
> > > Sean
> > >
> > >
> > > p.s.
> > > Please don't wait for it, but I am currently working on new dictionary
> > > code and plan to introduce that in ctakes.  Again, please don't wait
> for
> > it
> > > as it is mixed in with other work and will not be available for several
> > > months (if at all).
> > >
> > >
> > > ________________________________________
> > > From: Jeffrey Miller <je...@gmail.com>
> > > Sent: Sunday, June 16, 2019 9:49 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > > sno_rx16ab from sourceforge [EXTERNAL]
> > >
> > > Hi Sean,
> > >
> > > Thanks for your response. I had two follow-up questions that would be
> > very
> > > helpful to understand if you have a few moments:
> > >
> > > 1) Are the specific filters used in the official sno_rx_16ab codified
> > > anywhere so that I could reproduce them?
> > >
> > > 2) Do these filters explain all the changes? For example, when I use
> the
> > > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > > Especially with the addition of "dm" it feels like I must be missing a
> > step
> > > or a setting somewhere.
> > >
> > > Thanks!
> > > Jeff
> > >
> > > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> > and
> > > > rxnorm terms with certain symantic types.  Nothing was added, but
> > > synonyms
> > > > are filtered based upon various rules.  For instance, unnecessary
> > > suffixes
> > > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> > excluded
> > > > ("can walk straight line with only minimal assistance"), terms with
> > dose
> > > or
> > > > form are ignored and so forth.
> > > >
> > > > Some filters can be changed by adding/removing from
> > > prefix/suffix/contains
> > > > lists in plaintext files or by modifying the dictionary creator code.
> > > >
> > > > There was no manual curation (or nothing major).  As Remy mentioned
> > that
> > > > requires a lot of attention and time.  The dictionary database was
> not
> > > > intended to be perfect, just as good as possible without major
> > > investment -
> > > > and reproducible with updates to the umls.
> > > >
> > > > As the dictionary is released as a sql database, you should be able
> to
> > > add
> > > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > > "manual
> > > > edit" panel to the dictionary gui, but haven't had the time.  If
> > anybody
> > > > else would like to work on such a tool that would be tonic.
> > > >
> > > > Sean
> > > >
> > > >
> > > > ________________________________________
> > > > From: Harish Kulkarni <ha...@gmail.com>
> > > > Sent: Saturday, June 15, 2019 5:16 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> and
> > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > >
> > > > unsubscribe
> > > >
> > > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <
> remys@foreseemed.com>
> > > > wrote:
> > > >
> > > > > Yes, I agree it would be nice because the tokenization that occurs
> > when
> > > > > creating the dictionaries from the releases make comparisons a bit
> > > tricky
> > > > > and is not 100% reversible. I would love to hear an answer to your
> > > > > quandary.
> > > > >
> > > > >      Remy
> > > > >
> > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks, I was curious if the cTAKES devs that created the
> > sno_rx_16ab
> > > > > > dictionary had put the differences applied to the default UMLS
> > output
> > > > > into
> > > > > > version control in some form. I imagine the
> > > > > > additions/synonyms/abbreviations that were added manually must
> have
> > > > been
> > > > > > collected over time somewhere prior to merging them with 2016ab
> > UMLS
> > > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > > release
> > > > > with
> > > > > > an additional ontology and the latest terms. I can likely come up
> > > with
> > > > a
> > > > > > diff myself but was wondering if this was already maintained as
> > part
> > > of
> > > > > > cTAKES.
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > > remys@foreseemed.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > > > dictionary,
> > > > > > > but to put in corrections because, lo and behold, there are
> some
> > > > errors
> > > > > > in
> > > > > > > there!. As you know, an ontology is a constant curation job and
> > > that
> > > > > > > script, under SCM, allows you to isolate those changes and, if
> > > > > necessary,
> > > > > > > re-apply them to new versions.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > > gandhirajan.n@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > As far as I know, maintaining a separate SQL script to add
> > > > additional
> > > > > > > > entries should work seamlessly.
> > > > > > > >
> > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > > > dictionary
> > > > > > > > > itself) so they can be carried forward in future dictionary
> > > > > updates?
> > > > > > > > >
> > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > > remys@foreseemed.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > From my experience, it seems pretty obvious that
> > sno_rx_16ab
> > > > is a
> > > > > > > > curated
> > > > > > > > > > dictionary based on the SNOMED 2016AB release. It does
> not
> > > > > contain
> > > > > > > the
> > > > > > > > > full
> > > > > > > > > > set but it has additional edits and synonyms that are
> > pretty
> > > > > useful
> > > > > > > > > > (including 'dm').
> > > > > > > > > >
> > > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > > >
> > > > > > > > > >       Remy
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > > jeffmax@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > I have created a custom dictionary from the latest UMLS
> > > > release
> > > > > > > with
> > > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > > > generating
> > > > > > > > > > .script
> > > > > > > > > > > file with unexpected differences as compared to the
> > > > sno_rx_16ab
> > > > > > > file
> > > > > > > > > > > available as part of the cTAKES release. Specifically,
> > for
> > > > > > > diabetes,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > missing these two rows:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > > INSERT INTO CUI_TERMS
> > > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > > >
> > > > > > > > > > > and only has this one:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > > mellitus','mellitus')
> > > > > > > > > > >
> > > > > > > > > > > The end result is that "diabetes" is not being picked
> up
> > in
> > > > the
> > > > > > > test
> > > > > > > > > > text I
> > > > > > > > > > > am running through- it requires the full 'diabetes
> > > mellitus'.
> > > > > > > > > > >
> > > > > > > > > > > Is there any setting on the UMLS install side or the
> > > ctTAKES
> > > > > > > > dictionary
> > > > > > > > > > > creator that could account for missing alternative
> forms
> > > like
> > > > > > this?
> > > > > > > > > I've
> > > > > > > > > > > tried downloading the 2016AB release (which I think is
> > the
> > > > one
> > > > > > used
> > > > > > > > to
> > > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > > getting
> > > > > the
> > > > > > > > > > alternate
> > > > > > > > > > > forms in that dictionary either.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Jeff
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > > Gandhi
> > > > > > > >
> > > > > > > > "The best way to find urself is to lose urself in the service
> > of
> > > > > others
> > > > > > > > !!!"
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Thanks Sean. I actually think I figured out what is causing the difference.
When I create the UMLS install on my machine, I only install RxNorm and
SNOMEDCT_US, so when I use the dictionaryCreator GUI, there are only those
two sources on the left. I noticed in the screenshots on the wiki page for
the dictionary creator GUI that many sources were installed, but only
SNOMEDCT_US and RxNorm were selected. So, I tried installing all of the
active UMLS set (but still only selecting RxNorm and SNOMEDCT_US in the
dictionaryCreator GUI) and it made a difference as to which terms appeared
in the final cTAKES dictionary. As an example, I now get the "DM" entry for
diabetes. I don't know why this should make a difference, but it appears
that it does.

Another odd observation related to this. In the sno_rx_2016ab file, I
noticed there seems to be an error:
INSERT INTO CUI_TERMS VALUES(11849,0,2,'diabete mellitus','diabete')

The 's' is missing from diabetes. When I created my dictionary (from the
restricted UMLS install, but still 2016ab) the cTAKES dictionary entry for
that term is correct:
INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')

When I created the dictionary from the full cTAKES install tonight, that
error appeared again.

Jeff



On Mon, Jun 17, 2019 at 8:08 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years
> ago I can't swear to any of those filter sets being exactly what was used.
>
> I think that the key to working with any project is to check the
> dictionary against a project's needs.  Fill in the gaps by either editing
> the sql (.script) file or by adding a second dictionary.  In smaller
> "focus" projects I usually end up augmenting the default dictionary with a
> small custom bsv dictionary to catch any known synonyms or terms that
> aren't represented in the default.  In projects requiring larger nets I
> have built dictionaries that are horribly inclusive - 2 to 3 times the
> sno_rx_16ab.
>
> Sean
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Monday, June 17, 2019 4:39 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Thanks for following up Sean. I've looked into the links you sent along.
> There are different groups of filters and it appears that the
> dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
> directory. I don't think this is the set of filters used to make
> sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
> veterinary product.  310367) in "UnwantedTexts.txt", but the
> sno_rx_16ab.script file has "today" still in there. If you create a
> dictionary with the dictionary builder, it does not include that term.
>
> I thought maybe the set of files under the "default" filter directory might
> be the one used for the sno_rx_16ab package so I recompiled the
> dictionaryCreator GUI to use the "default" filter files and created a new
> snomed rxnorm dictionary from the 2016ab umls release, but the output is
> still quite different that the packaged sno_rx_16ab dictionary. From
> looking at diffs, it looks like there are a substantial number of additions
> to the sno_rx_16ab, so much so that I really must be missing something. For
> example, for CUI 12169 which describes a low sodium diet, there are about
> 27 CUI terms in sno_rx_16ab.script, but in the script generated by the
> dictionaryGUI there are only 7 (with the "tiny" or "default" filter
> groups).
>
> On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Thanks for the clarifications, Sean. That was very enlightening. I look
> > forward to the documentation (even if it entails some suffering on your
> > part.)
> >
> > If/when you stumble on some idle time allowing you to implement the
> manual
> > edit panel, it would be nice to have it allow for re-partitioning the
> > ontology. As you are very aware, UMLS CUIs and SNOMED do not always have
> a
> > one-to-one correspondence resulting in a CUI matching multiples SNOMEDs
> or
> > a SNOMED being mapped to several CUIs.
> >
> > In some cases, clinicians don't agree with that partitioning in
> specialized
> > contexts and the inheritance that ensues and would like to re-assign
> them.
> >
> > Not holding my breath, but just something to keep in mind.
> >
> >       Remy
> >
> > On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Jeff,
> > >
> > > >1) ...
> > > There are several collections of filter sets here:
> > >
> ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> > >
> > > 2) ...
> > > There is additional logic within the dictionary creator code:
> > > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> > >
> > > I haven't gone through it in a really long time, and without doing so
> now
> > > I can't enumerate the filters.  I have family visiting, otherwise my
> > > curiosity would force me to do so and get back to you.   Honestly, it
> > > should be documented somewhere, but writing (especially technical) is
> > > pretty much my least favorite activity.
> > >
> > > Sean
> > >
> > >
> > > p.s.
> > > Please don't wait for it, but I am currently working on new dictionary
> > > code and plan to introduce that in ctakes.  Again, please don't wait
> for
> > it
> > > as it is mixed in with other work and will not be available for several
> > > months (if at all).
> > >
> > >
> > > ________________________________________
> > > From: Jeffrey Miller <je...@gmail.com>
> > > Sent: Sunday, June 16, 2019 9:49 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > > sno_rx16ab from sourceforge [EXTERNAL]
> > >
> > > Hi Sean,
> > >
> > > Thanks for your response. I had two follow-up questions that would be
> > very
> > > helpful to understand if you have a few moments:
> > >
> > > 1) Are the specific filters used in the official sno_rx_16ab codified
> > > anywhere so that I could reproduce them?
> > >
> > > 2) Do these filters explain all the changes? For example, when I use
> the
> > > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > > Especially with the addition of "dm" it feels like I must be missing a
> > step
> > > or a setting somewhere.
> > >
> > > Thanks!
> > > Jeff
> > >
> > > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi all,
> > > >
> > > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> > and
> > > > rxnorm terms with certain symantic types.  Nothing was added, but
> > > synonyms
> > > > are filtered based upon various rules.  For instance, unnecessary
> > > suffixes
> > > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> > excluded
> > > > ("can walk straight line with only minimal assistance"), terms with
> > dose
> > > or
> > > > form are ignored and so forth.
> > > >
> > > > Some filters can be changed by adding/removing from
> > > prefix/suffix/contains
> > > > lists in plaintext files or by modifying the dictionary creator code.
> > > >
> > > > There was no manual curation (or nothing major).  As Remy mentioned
> > that
> > > > requires a lot of attention and time.  The dictionary database was
> not
> > > > intended to be perfect, just as good as possible without major
> > > investment -
> > > > and reproducible with updates to the umls.
> > > >
> > > > As the dictionary is released as a sql database, you should be able
> to
> > > add
> > > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > > "manual
> > > > edit" panel to the dictionary gui, but haven't had the time.  If
> > anybody
> > > > else would like to work on such a tool that would be tonic.
> > > >
> > > > Sean
> > > >
> > > >
> > > > ________________________________________
> > > > From: Harish Kulkarni <ha...@gmail.com>
> > > > Sent: Saturday, June 15, 2019 5:16 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: Re: Differences in dictionary built with dictionaryBuilder
> and
> > > > sno_rx16ab from sourceforge [EXTERNAL]
> > > >
> > > > unsubscribe
> > > >
> > > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <
> remys@foreseemed.com>
> > > > wrote:
> > > >
> > > > > Yes, I agree it would be nice because the tokenization that occurs
> > when
> > > > > creating the dictionaries from the releases make comparisons a bit
> > > tricky
> > > > > and is not 100% reversible. I would love to hear an answer to your
> > > > > quandary.
> > > > >
> > > > >      Remy
> > > > >
> > > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Thanks, I was curious if the cTAKES devs that created the
> > sno_rx_16ab
> > > > > > dictionary had put the differences applied to the default UMLS
> > output
> > > > > into
> > > > > > version control in some form. I imagine the
> > > > > > additions/synonyms/abbreviations that were added manually must
> have
> > > > been
> > > > > > collected over time somewhere prior to merging them with 2016ab
> > UMLS
> > > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > > release
> > > > > with
> > > > > > an additional ontology and the latest terms. I can likely come up
> > > with
> > > > a
> > > > > > diff myself but was wondering if this was already maintained as
> > part
> > > of
> > > > > > cTAKES.
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > > remys@foreseemed.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > > > dictionary,
> > > > > > > but to put in corrections because, lo and behold, there are
> some
> > > > errors
> > > > > > in
> > > > > > > there!. As you know, an ontology is a constant curation job and
> > > that
> > > > > > > script, under SCM, allows you to isolate those changes and, if
> > > > > necessary,
> > > > > > > re-apply them to new versions.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > > gandhirajan.n@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jeff,
> > > > > > > >
> > > > > > > > As far as I know, maintaining a separate SQL script to add
> > > > additional
> > > > > > > > entries should work seamlessly.
> > > > > > > >
> > > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > > > dictionary
> > > > > > > > > itself) so they can be carried forward in future dictionary
> > > > > updates?
> > > > > > > > >
> > > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > > remys@foreseemed.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > From my experience, it seems pretty obvious that
> > sno_rx_16ab
> > > > is a
> > > > > > > > curated
> > > > > > > > > > dictionary based on the SNOMED 2016AB release. It does
> not
> > > > > contain
> > > > > > > the
> > > > > > > > > full
> > > > > > > > > > set but it has additional edits and synonyms that are
> > pretty
> > > > > useful
> > > > > > > > > > (including 'dm').
> > > > > > > > > >
> > > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > > >
> > > > > > > > > >       Remy
> > > > > > > > > >
> > > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > > jeffmax@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > > I have created a custom dictionary from the latest UMLS
> > > > release
> > > > > > > with
> > > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > > > generating
> > > > > > > > > > .script
> > > > > > > > > > > file with unexpected differences as compared to the
> > > > sno_rx_16ab
> > > > > > > file
> > > > > > > > > > > available as part of the cTAKES release. Specifically,
> > for
> > > > > > > diabetes,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > missing these two rows:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > > INSERT INTO CUI_TERMS
> > > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > > >
> > > > > > > > > > > and only has this one:
> > > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > > mellitus','mellitus')
> > > > > > > > > > >
> > > > > > > > > > > The end result is that "diabetes" is not being picked
> up
> > in
> > > > the
> > > > > > > test
> > > > > > > > > > text I
> > > > > > > > > > > am running through- it requires the full 'diabetes
> > > mellitus'.
> > > > > > > > > > >
> > > > > > > > > > > Is there any setting on the UMLS install side or the
> > > ctTAKES
> > > > > > > > dictionary
> > > > > > > > > > > creator that could account for missing alternative
> forms
> > > like
> > > > > > this?
> > > > > > > > > I've
> > > > > > > > > > > tried downloading the 2016AB release (which I think is
> > the
> > > > one
> > > > > > used
> > > > > > > > to
> > > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > > getting
> > > > > the
> > > > > > > > > > alternate
> > > > > > > > > > > forms in that dictionary either.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Jeff
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Regards,
> > > > > > > > Gandhi
> > > > > > > >
> > > > > > > > "The best way to find urself is to lose urself in the service
> > of
> > > > > others
> > > > > > > > !!!"
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

Thanks for doing the research.  Since the sno_rx_16ab was made 3+ years ago I can't swear to any of those filter sets being exactly what was used.

I think that the key to working with any project is to check the dictionary against a project's needs.  Fill in the gaps by either editing the sql (.script) file or by adding a second dictionary.  In smaller "focus" projects I usually end up augmenting the default dictionary with a small custom bsv dictionary to catch any known synonyms or terms that aren't represented in the default.  In projects requiring larger nets I have built dictionaries that are horribly inclusive - 2 to 3 times the sno_rx_16ab.  

Sean
________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Monday, June 17, 2019 4:39 PM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Thanks for following up Sean. I've looked into the links you sent along.
There are different groups of filters and it appears that the
dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
directory. I don't think this is the set of filters used to make
sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
veterinary product.  310367) in "UnwantedTexts.txt", but the
sno_rx_16ab.script file has "today" still in there. If you create a
dictionary with the dictionary builder, it does not include that term.

I thought maybe the set of files under the "default" filter directory might
be the one used for the sno_rx_16ab package so I recompiled the
dictionaryCreator GUI to use the "default" filter files and created a new
snomed rxnorm dictionary from the 2016ab umls release, but the output is
still quite different that the packaged sno_rx_16ab dictionary. From
looking at diffs, it looks like there are a substantial number of additions
to the sno_rx_16ab, so much so that I really must be missing something. For
example, for CUI 12169 which describes a low sodium diet, there are about
27 CUI terms in sno_rx_16ab.script, but in the script generated by the
dictionaryGUI there are only 7 (with the "tiny" or "default" filter groups).

On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Thanks for the clarifications, Sean. That was very enlightening. I look
> forward to the documentation (even if it entails some suffering on your
> part.)
>
> If/when you stumble on some idle time allowing you to implement the manual
> edit panel, it would be nice to have it allow for re-partitioning the
> ontology. As you are very aware, UMLS CUIs and SNOMED do not always have a
> one-to-one correspondence resulting in a CUI matching multiples SNOMEDs or
> a SNOMED being mapped to several CUIs.
>
> In some cases, clinicians don't agree with that partitioning in specialized
> contexts and the inheritance that ensues and would like to re-assign them.
>
> Not holding my breath, but just something to keep in mind.
>
>       Remy
>
> On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > >1) ...
> > There are several collections of filter sets here:
> > ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> >
> > 2) ...
> > There is additional logic within the dictionary creator code:
> > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> >
> > I haven't gone through it in a really long time, and without doing so now
> > I can't enumerate the filters.  I have family visiting, otherwise my
> > curiosity would force me to do so and get back to you.   Honestly, it
> > should be documented somewhere, but writing (especially technical) is
> > pretty much my least favorite activity.
> >
> > Sean
> >
> >
> > p.s.
> > Please don't wait for it, but I am currently working on new dictionary
> > code and plan to introduce that in ctakes.  Again, please don't wait for
> it
> > as it is mixed in with other work and will not be available for several
> > months (if at all).
> >
> >
> > ________________________________________
> > From: Jeffrey Miller <je...@gmail.com>
> > Sent: Sunday, June 16, 2019 9:49 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thanks for your response. I had two follow-up questions that would be
> very
> > helpful to understand if you have a few moments:
> >
> > 1) Are the specific filters used in the official sno_rx_16ab codified
> > anywhere so that I could reproduce them?
> >
> > 2) Do these filters explain all the changes? For example, when I use the
> > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > Especially with the addition of "dm" it feels like I must be missing a
> step
> > or a setting somewhere.
> >
> > Thanks!
> > Jeff
> >
> > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi all,
> > >
> > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> and
> > > rxnorm terms with certain symantic types.  Nothing was added, but
> > synonyms
> > > are filtered based upon various rules.  For instance, unnecessary
> > suffixes
> > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> excluded
> > > ("can walk straight line with only minimal assistance"), terms with
> dose
> > or
> > > form are ignored and so forth.
> > >
> > > Some filters can be changed by adding/removing from
> > prefix/suffix/contains
> > > lists in plaintext files or by modifying the dictionary creator code.
> > >
> > > There was no manual curation (or nothing major).  As Remy mentioned
> that
> > > requires a lot of attention and time.  The dictionary database was not
> > > intended to be perfect, just as good as possible without major
> > investment -
> > > and reproducible with updates to the umls.
> > >
> > > As the dictionary is released as a sql database, you should be able to
> > add
> > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > "manual
> > > edit" panel to the dictionary gui, but haven't had the time.  If
> anybody
> > > else would like to work on such a tool that would be tonic.
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Harish Kulkarni <ha...@gmail.com>
> > > Sent: Saturday, June 15, 2019 5:16 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > > sno_rx16ab from sourceforge [EXTERNAL]
> > >
> > > unsubscribe
> > >
> > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
> > > wrote:
> > >
> > > > Yes, I agree it would be nice because the tokenization that occurs
> when
> > > > creating the dictionaries from the releases make comparisons a bit
> > tricky
> > > > and is not 100% reversible. I would love to hear an answer to your
> > > > quandary.
> > > >
> > > >      Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks, I was curious if the cTAKES devs that created the
> sno_rx_16ab
> > > > > dictionary had put the differences applied to the default UMLS
> output
> > > > into
> > > > > version control in some form. I imagine the
> > > > > additions/synonyms/abbreviations that were added manually must have
> > > been
> > > > > collected over time somewhere prior to merging them with 2016ab
> UMLS
> > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > release
> > > > with
> > > > > an additional ontology and the latest terms. I can likely come up
> > with
> > > a
> > > > > diff myself but was wondering if this was already maintained as
> part
> > of
> > > > > cTAKES.
> > > > >
> > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > remys@foreseemed.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > > dictionary,
> > > > > > but to put in corrections because, lo and behold, there are some
> > > errors
> > > > > in
> > > > > > there!. As you know, an ontology is a constant curation job and
> > that
> > > > > > script, under SCM, allows you to isolate those changes and, if
> > > > necessary,
> > > > > > re-apply them to new versions.
> > > > > >
> > > > > >       Remy
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > gandhirajan.n@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jeff,
> > > > > > >
> > > > > > > As far as I know, maintaining a separate SQL script to add
> > > additional
> > > > > > > entries should work seamlessly.
> > > > > > >
> > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > > dictionary
> > > > > > > > itself) so they can be carried forward in future dictionary
> > > > updates?
> > > > > > > >
> > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > remys@foreseemed.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > From my experience, it seems pretty obvious that
> sno_rx_16ab
> > > is a
> > > > > > > curated
> > > > > > > > > dictionary based on the SNOMED 2016AB release. It does not
> > > > contain
> > > > > > the
> > > > > > > > full
> > > > > > > > > set but it has additional edits and synonyms that are
> pretty
> > > > useful
> > > > > > > > > (including 'dm').
> > > > > > > > >
> > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > >
> > > > > > > > >       Remy
> > > > > > > > >
> > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > jeffmax@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > I have created a custom dictionary from the latest UMLS
> > > release
> > > > > > with
> > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > > generating
> > > > > > > > > .script
> > > > > > > > > > file with unexpected differences as compared to the
> > > sno_rx_16ab
> > > > > > file
> > > > > > > > > > available as part of the cTAKES release. Specifically,
> for
> > > > > > diabetes,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > missing these two rows:
> > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > INSERT INTO CUI_TERMS
> > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > >
> > > > > > > > > > and only has this one:
> > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > mellitus','mellitus')
> > > > > > > > > >
> > > > > > > > > > The end result is that "diabetes" is not being picked up
> in
> > > the
> > > > > > test
> > > > > > > > > text I
> > > > > > > > > > am running through- it requires the full 'diabetes
> > mellitus'.
> > > > > > > > > >
> > > > > > > > > > Is there any setting on the UMLS install side or the
> > ctTAKES
> > > > > > > dictionary
> > > > > > > > > > creator that could account for missing alternative forms
> > like
> > > > > this?
> > > > > > > > I've
> > > > > > > > > > tried downloading the 2016AB release (which I think is
> the
> > > one
> > > > > used
> > > > > > > to
> > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > getting
> > > > the
> > > > > > > > > alternate
> > > > > > > > > > forms in that dictionary either.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Jeff
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Gandhi
> > > > > > >
> > > > > > > "The best way to find urself is to lose urself in the service
> of
> > > > others
> > > > > > > !!!"
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Thanks for following up Sean. I've looked into the links you sent along.
There are different groups of filters and it appears that the
dictionaryBuilder GUI is hardcoded to use the files in the "tiny"
directory. I don't think this is the set of filters used to make
sno_rx_16ab because the 'tiny' filter group contains "today" (today brand
veterinary product.  310367) in "UnwantedTexts.txt", but the
sno_rx_16ab.script file has "today" still in there. If you create a
dictionary with the dictionary builder, it does not include that term.

I thought maybe the set of files under the "default" filter directory might
be the one used for the sno_rx_16ab package so I recompiled the
dictionaryCreator GUI to use the "default" filter files and created a new
snomed rxnorm dictionary from the 2016ab umls release, but the output is
still quite different that the packaged sno_rx_16ab dictionary. From
looking at diffs, it looks like there are a substantial number of additions
to the sno_rx_16ab, so much so that I really must be missing something. For
example, for CUI 12169 which describes a low sodium diet, there are about
27 CUI terms in sno_rx_16ab.script, but in the script generated by the
dictionaryGUI there are only 7 (with the "tiny" or "default" filter groups).

On Sun, Jun 16, 2019 at 3:27 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Thanks for the clarifications, Sean. That was very enlightening. I look
> forward to the documentation (even if it entails some suffering on your
> part.)
>
> If/when you stumble on some idle time allowing you to implement the manual
> edit panel, it would be nice to have it allow for re-partitioning the
> ontology. As you are very aware, UMLS CUIs and SNOMED do not always have a
> one-to-one correspondence resulting in a CUI matching multiples SNOMEDs or
> a SNOMED being mapped to several CUIs.
>
> In some cases, clinicians don't agree with that partitioning in specialized
> contexts and the inheritance that ensues and would like to re-assign them.
>
> Not holding my breath, but just something to keep in mind.
>
>       Remy
>
> On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > >1) ...
> > There are several collections of filter sets here:
> > ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
> >
> > 2) ...
> > There is additional logic within the dictionary creator code:
> > ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
> >
> > I haven't gone through it in a really long time, and without doing so now
> > I can't enumerate the filters.  I have family visiting, otherwise my
> > curiosity would force me to do so and get back to you.   Honestly, it
> > should be documented somewhere, but writing (especially technical) is
> > pretty much my least favorite activity.
> >
> > Sean
> >
> >
> > p.s.
> > Please don't wait for it, but I am currently working on new dictionary
> > code and plan to introduce that in ctakes.  Again, please don't wait for
> it
> > as it is mixed in with other work and will not be available for several
> > months (if at all).
> >
> >
> > ________________________________________
> > From: Jeffrey Miller <je...@gmail.com>
> > Sent: Sunday, June 16, 2019 9:49 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > Hi Sean,
> >
> > Thanks for your response. I had two follow-up questions that would be
> very
> > helpful to understand if you have a few moments:
> >
> > 1) Are the specific filters used in the official sno_rx_16ab codified
> > anywhere so that I could reproduce them?
> >
> > 2) Do these filters explain all the changes? For example, when I use the
> > dictionary creator to export sno_med and rx_norm, I only get "diabetes
> > mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> > Especially with the addition of "dm" it feels like I must be missing a
> step
> > or a setting somewhere.
> >
> > Thanks!
> > Jeff
> >
> > On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi all,
> > >
> > > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed
> and
> > > rxnorm terms with certain symantic types.  Nothing was added, but
> > synonyms
> > > are filtered based upon various rules.  For instance, unnecessary
> > suffixes
> > > are removed ("Wart (Finding)" -> "Wart"), really long terms are
> excluded
> > > ("can walk straight line with only minimal assistance"), terms with
> dose
> > or
> > > form are ignored and so forth.
> > >
> > > Some filters can be changed by adding/removing from
> > prefix/suffix/contains
> > > lists in plaintext files or by modifying the dictionary creator code.
> > >
> > > There was no manual curation (or nothing major).  As Remy mentioned
> that
> > > requires a lot of attention and time.  The dictionary database was not
> > > intended to be perfect, just as good as possible without major
> > investment -
> > > and reproducible with updates to the umls.
> > >
> > > As the dictionary is released as a sql database, you should be able to
> > add
> > > and remove fairly easily if sql savvy.  I have long wanted to add a
> > "manual
> > > edit" panel to the dictionary gui, but haven't had the time.  If
> anybody
> > > else would like to work on such a tool that would be tonic.
> > >
> > > Sean
> > >
> > >
> > > ________________________________________
> > > From: Harish Kulkarni <ha...@gmail.com>
> > > Sent: Saturday, June 15, 2019 5:16 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > > sno_rx16ab from sourceforge [EXTERNAL]
> > >
> > > unsubscribe
> > >
> > > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
> > > wrote:
> > >
> > > > Yes, I agree it would be nice because the tokenization that occurs
> when
> > > > creating the dictionaries from the releases make comparisons a bit
> > tricky
> > > > and is not 100% reversible. I would love to hear an answer to your
> > > > quandary.
> > > >
> > > >      Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> > > wrote:
> > > >
> > > > > Thanks, I was curious if the cTAKES devs that created the
> sno_rx_16ab
> > > > > dictionary had put the differences applied to the default UMLS
> output
> > > > into
> > > > > version control in some form. I imagine the
> > > > > additions/synonyms/abbreviations that were added manually must have
> > > been
> > > > > collected over time somewhere prior to merging them with 2016ab
> UMLS
> > > > > release? I basically want to recreate the default cTAKES 4.0.0
> > release
> > > > with
> > > > > an additional ontology and the latest terms. I can likely come up
> > with
> > > a
> > > > > diff myself but was wondering if this was already maintained as
> part
> > of
> > > > > cTAKES.
> > > > >
> > > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> > remys@foreseemed.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > > dictionary,
> > > > > > but to put in corrections because, lo and behold, there are some
> > > errors
> > > > > in
> > > > > > there!. As you know, an ontology is a constant curation job and
> > that
> > > > > > script, under SCM, allows you to isolate those changes and, if
> > > > necessary,
> > > > > > re-apply them to new versions.
> > > > > >
> > > > > >       Remy
> > > > > >
> > > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > > gandhirajan.n@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jeff,
> > > > > > >
> > > > > > > As far as I know, maintaining a separate SQL script to add
> > > additional
> > > > > > > entries should work seamlessly.
> > > > > > >
> > > > > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > > dictionary
> > > > > > > > itself) so they can be carried forward in future dictionary
> > > > updates?
> > > > > > > >
> > > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > > remys@foreseemed.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > From my experience, it seems pretty obvious that
> sno_rx_16ab
> > > is a
> > > > > > > curated
> > > > > > > > > dictionary based on the SNOMED 2016AB release. It does not
> > > > contain
> > > > > > the
> > > > > > > > full
> > > > > > > > > set but it has additional edits and synonyms that are
> pretty
> > > > useful
> > > > > > > > > (including 'dm').
> > > > > > > > >
> > > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > > >
> > > > > > > > >       Remy
> > > > > > > > >
> > > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > > jeffmax@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > > I have created a custom dictionary from the latest UMLS
> > > release
> > > > > > with
> > > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > > generating
> > > > > > > > > .script
> > > > > > > > > > file with unexpected differences as compared to the
> > > sno_rx_16ab
> > > > > > file
> > > > > > > > > > available as part of the cTAKES release. Specifically,
> for
> > > > > > diabetes,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > missing these two rows:
> > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > > INSERT INTO CUI_TERMS
> > VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > > >
> > > > > > > > > > and only has this one:
> > > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > > mellitus','mellitus')
> > > > > > > > > >
> > > > > > > > > > The end result is that "diabetes" is not being picked up
> in
> > > the
> > > > > > test
> > > > > > > > > text I
> > > > > > > > > > am running through- it requires the full 'diabetes
> > mellitus'.
> > > > > > > > > >
> > > > > > > > > > Is there any setting on the UMLS install side or the
> > ctTAKES
> > > > > > > dictionary
> > > > > > > > > > creator that could account for missing alternative forms
> > like
> > > > > this?
> > > > > > > > I've
> > > > > > > > > > tried downloading the 2016AB release (which I think is
> the
> > > one
> > > > > used
> > > > > > > to
> > > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> > getting
> > > > the
> > > > > > > > > alternate
> > > > > > > > > > forms in that dictionary either.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Jeff
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Gandhi
> > > > > > >
> > > > > > > "The best way to find urself is to lose urself in the service
> of
> > > > others
> > > > > > > !!!"
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by Remy Sanouillet <re...@foreseemed.com>.
Thanks for the clarifications, Sean. That was very enlightening. I look
forward to the documentation (even if it entails some suffering on your
part.)

If/when you stumble on some idle time allowing you to implement the manual
edit panel, it would be nice to have it allow for re-partitioning the
ontology. As you are very aware, UMLS CUIs and SNOMED do not always have a
one-to-one correspondence resulting in a CUI matching multiples SNOMEDs or
a SNOMED being mapped to several CUIs.

In some cases, clinicians don't agree with that partitioning in specialized
contexts and the inheritance that ensues and would like to re-assign them.

Not holding my breath, but just something to keep in mind.

      Remy

On Sun, Jun 16, 2019 at 7:16 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> >1) ...
> There are several collections of filter sets here:
> ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\
>
> 2) ...
> There is additional logic within the dictionary creator code:
> ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\
>
> I haven't gone through it in a really long time, and without doing so now
> I can't enumerate the filters.  I have family visiting, otherwise my
> curiosity would force me to do so and get back to you.   Honestly, it
> should be documented somewhere, but writing (especially technical) is
> pretty much my least favorite activity.
>
> Sean
>
>
> p.s.
> Please don't wait for it, but I am currently working on new dictionary
> code and plan to introduce that in ctakes.  Again, please don't wait for it
> as it is mixed in with other work and will not be available for several
> months (if at all).
>
>
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Sunday, June 16, 2019 9:49 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> Hi Sean,
>
> Thanks for your response. I had two follow-up questions that would be very
> helpful to understand if you have a few moments:
>
> 1) Are the specific filters used in the official sno_rx_16ab codified
> anywhere so that I could reproduce them?
>
> 2) Do these filters explain all the changes? For example, when I use the
> dictionary creator to export sno_med and rx_norm, I only get "diabetes
> mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
> Especially with the addition of "dm" it feels like I must be missing a step
> or a setting somewhere.
>
> Thanks!
> Jeff
>
> On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi all,
> >
> > The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> > rxnorm terms with certain symantic types.  Nothing was added, but
> synonyms
> > are filtered based upon various rules.  For instance, unnecessary
> suffixes
> > are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> > ("can walk straight line with only minimal assistance"), terms with dose
> or
> > form are ignored and so forth.
> >
> > Some filters can be changed by adding/removing from
> prefix/suffix/contains
> > lists in plaintext files or by modifying the dictionary creator code.
> >
> > There was no manual curation (or nothing major).  As Remy mentioned that
> > requires a lot of attention and time.  The dictionary database was not
> > intended to be perfect, just as good as possible without major
> investment -
> > and reproducible with updates to the umls.
> >
> > As the dictionary is released as a sql database, you should be able to
> add
> > and remove fairly easily if sql savvy.  I have long wanted to add a
> "manual
> > edit" panel to the dictionary gui, but haven't had the time.  If anybody
> > else would like to work on such a tool that would be tonic.
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Harish Kulkarni <ha...@gmail.com>
> > Sent: Saturday, June 15, 2019 5:16 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Differences in dictionary built with dictionaryBuilder and
> > sno_rx16ab from sourceforge [EXTERNAL]
> >
> > unsubscribe
> >
> > On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > Yes, I agree it would be nice because the tokenization that occurs when
> > > creating the dictionaries from the releases make comparisons a bit
> tricky
> > > and is not 100% reversible. I would love to hear an answer to your
> > > quandary.
> > >
> > >      Remy
> > >
> > > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> > wrote:
> > >
> > > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > > dictionary had put the differences applied to the default UMLS output
> > > into
> > > > version control in some form. I imagine the
> > > > additions/synonyms/abbreviations that were added manually must have
> > been
> > > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > > release? I basically want to recreate the default cTAKES 4.0.0
> release
> > > with
> > > > an additional ontology and the latest terms. I can likely come up
> with
> > a
> > > > diff myself but was wondering if this was already maintained as part
> of
> > > > cTAKES.
> > > >
> > > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <
> remys@foreseemed.com
> > >
> > > > wrote:
> > > >
> > > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > > dictionary,
> > > > > but to put in corrections because, lo and behold, there are some
> > errors
> > > > in
> > > > > there!. As you know, an ontology is a constant curation job and
> that
> > > > > script, under SCM, allows you to isolate those changes and, if
> > > necessary,
> > > > > re-apply them to new versions.
> > > > >
> > > > >       Remy
> > > > >
> > > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> > gandhirajan.n@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Jeff,
> > > > > >
> > > > > > As far as I know, maintaining a separate SQL script to add
> > additional
> > > > > > entries should work seamlessly.
> > > > > >
> > > > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > > dictionary
> > > > > > > itself) so they can be carried forward in future dictionary
> > > updates?
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > > remys@foreseemed.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> > is a
> > > > > > curated
> > > > > > > > dictionary based on the SNOMED 2016AB release. It does not
> > > contain
> > > > > the
> > > > > > > full
> > > > > > > > set but it has additional edits and synonyms that are pretty
> > > useful
> > > > > > > > (including 'dm').
> > > > > > > >
> > > > > > > > We have had to manage those mods as an adjunct.
> > > > > > > >
> > > > > > > >       Remy
> > > > > > > >
> > > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > > jeffmax@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > I have created a custom dictionary from the latest UMLS
> > release
> > > > > with
> > > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > > generating
> > > > > > > > .script
> > > > > > > > > file with unexpected differences as compared to the
> > sno_rx_16ab
> > > > > file
> > > > > > > > > available as part of the cTAKES release. Specifically, for
> > > > > diabetes,
> > > > > > it
> > > > > > > > is
> > > > > > > > > missing these two rows:
> > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > > INSERT INTO CUI_TERMS
> VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > > >
> > > > > > > > > and only has this one:
> > > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > > mellitus','mellitus')
> > > > > > > > >
> > > > > > > > > The end result is that "diabetes" is not being picked up in
> > the
> > > > > test
> > > > > > > > text I
> > > > > > > > > am running through- it requires the full 'diabetes
> mellitus'.
> > > > > > > > >
> > > > > > > > > Is there any setting on the UMLS install side or the
> ctTAKES
> > > > > > dictionary
> > > > > > > > > creator that could account for missing alternative forms
> like
> > > > this?
> > > > > > > I've
> > > > > > > > > tried downloading the 2016AB release (which I think is the
> > one
> > > > used
> > > > > > to
> > > > > > > > > create the bundled sno_rx_16ab package?) and I am not
> getting
> > > the
> > > > > > > > alternate
> > > > > > > > > forms in that dictionary either.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jeff
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Gandhi
> > > > > >
> > > > > > "The best way to find urself is to lose urself in the service of
> > > others
> > > > > > !!!"
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

>1) ...
There are several collections of filter sets here:
ctakes-gui-res\src\main\resources\org\apache\ctakes\gui\dictionary\data\

2) ...
There is additional logic within the dictionary creator code:
ctakes-gui\src\main\java\org\apache\ctakes\gui\dictionary\

I haven't gone through it in a really long time, and without doing so now I can't enumerate the filters.  I have family visiting, otherwise my curiosity would force me to do so and get back to you.   Honestly, it should be documented somewhere, but writing (especially technical) is pretty much my least favorite activity.

Sean


p.s.
Please don't wait for it, but I am currently working on new dictionary code and plan to introduce that in ctakes.  Again, please don't wait for it as it is mixed in with other work and will not be available for several months (if at all).


________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Sunday, June 16, 2019 9:49 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Hi Sean,

Thanks for your response. I had two follow-up questions that would be very
helpful to understand if you have a few moments:

1) Are the specific filters used in the official sno_rx_16ab codified
anywhere so that I could reproduce them?

2) Do these filters explain all the changes? For example, when I use the
dictionary creator to export sno_med and rx_norm, I only get "diabetes
mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
Especially with the addition of "dm" it feels like I must be missing a step
or a setting somewhere.

Thanks!
Jeff

On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi all,
>
> The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> rxnorm terms with certain symantic types.  Nothing was added, but synonyms
> are filtered based upon various rules.  For instance, unnecessary suffixes
> are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> ("can walk straight line with only minimal assistance"), terms with dose or
> form are ignored and so forth.
>
> Some filters can be changed by adding/removing from prefix/suffix/contains
> lists in plaintext files or by modifying the dictionary creator code.
>
> There was no manual curation (or nothing major).  As Remy mentioned that
> requires a lot of attention and time.  The dictionary database was not
> intended to be perfect, just as good as possible without major investment -
> and reproducible with updates to the umls.
>
> As the dictionary is released as a sql database, you should be able to add
> and remove fairly easily if sql savvy.  I have long wanted to add a "manual
> edit" panel to the dictionary gui, but haven't had the time.  If anybody
> else would like to work on such a tool that would be tonic.
>
> Sean
>
>
> ________________________________________
> From: Harish Kulkarni <ha...@gmail.com>
> Sent: Saturday, June 15, 2019 5:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> unsubscribe
>
> On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Yes, I agree it would be nice because the tokenization that occurs when
> > creating the dictionaries from the releases make comparisons a bit tricky
> > and is not 100% reversible. I would love to hear an answer to your
> > quandary.
> >
> >      Remy
> >
> > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> wrote:
> >
> > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > dictionary had put the differences applied to the default UMLS output
> > into
> > > version control in some form. I imagine the
> > > additions/synonyms/abbreviations that were added manually must have
> been
> > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > release? I basically want to recreate the default cTAKES 4.0.0 release
> > with
> > > an additional ontology and the latest terms. I can likely come up with
> a
> > > diff myself but was wondering if this was already maintained as part of
> > > cTAKES.
> > >
> > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <remys@foreseemed.com
> >
> > > wrote:
> > >
> > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > dictionary,
> > > > but to put in corrections because, lo and behold, there are some
> errors
> > > in
> > > > there!. As you know, an ontology is a constant curation job and that
> > > > script, under SCM, allows you to isolate those changes and, if
> > necessary,
> > > > re-apply them to new versions.
> > > >
> > > >       Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> gandhirajan.n@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jeff,
> > > > >
> > > > > As far as I know, maintaining a separate SQL script to add
> additional
> > > > > entries should work seamlessly.
> > > > >
> > > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > dictionary
> > > > > > itself) so they can be carried forward in future dictionary
> > updates?
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > remys@foreseemed.com>
> > > > > > wrote:
> > > > > >
> > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> is a
> > > > > curated
> > > > > > > dictionary based on the SNOMED 2016AB release. It does not
> > contain
> > > > the
> > > > > > full
> > > > > > > set but it has additional edits and synonyms that are pretty
> > useful
> > > > > > > (including 'dm').
> > > > > > >
> > > > > > > We have had to manage those mods as an adjunct.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have created a custom dictionary from the latest UMLS
> release
> > > > with
> > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > generating
> > > > > > > .script
> > > > > > > > file with unexpected differences as compared to the
> sno_rx_16ab
> > > > file
> > > > > > > > available as part of the cTAKES release. Specifically, for
> > > > diabetes,
> > > > > it
> > > > > > > is
> > > > > > > > missing these two rows:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > >
> > > > > > > > and only has this one:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > mellitus','mellitus')
> > > > > > > >
> > > > > > > > The end result is that "diabetes" is not being picked up in
> the
> > > > test
> > > > > > > text I
> > > > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > > > >
> > > > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > > > dictionary
> > > > > > > > creator that could account for missing alternative forms like
> > > this?
> > > > > > I've
> > > > > > > > tried downloading the 2016AB release (which I think is the
> one
> > > used
> > > > > to
> > > > > > > > create the bundled sno_rx_16ab package?) and I am not getting
> > the
> > > > > > > alternate
> > > > > > > > forms in that dictionary either.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jeff
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Gandhi
> > > > >
> > > > > "The best way to find urself is to lose urself in the service of
> > others
> > > > > !!!"
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Hi Sean,

Thanks for your response. I had two follow-up questions that would be very
helpful to understand if you have a few moments:

1) Are the specific filters used in the official sno_rx_16ab codified
anywhere so that I could reproduce them?

2) Do these filters explain all the changes? For example, when I use the
dictionary creator to export sno_med and rx_norm, I only get "diabetes
mellitus" where as sno_rx_16ab contains both "diabetes" and "dm".
Especially with the addition of "dm" it feels like I must be missing a step
or a setting somewhere.

Thanks!
Jeff

On Sun, Jun 16, 2019 at 8:55 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi all,
>
> The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and
> rxnorm terms with certain symantic types.  Nothing was added, but synonyms
> are filtered based upon various rules.  For instance, unnecessary suffixes
> are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded
> ("can walk straight line with only minimal assistance"), terms with dose or
> form are ignored and so forth.
>
> Some filters can be changed by adding/removing from prefix/suffix/contains
> lists in plaintext files or by modifying the dictionary creator code.
>
> There was no manual curation (or nothing major).  As Remy mentioned that
> requires a lot of attention and time.  The dictionary database was not
> intended to be perfect, just as good as possible without major investment -
> and reproducible with updates to the umls.
>
> As the dictionary is released as a sql database, you should be able to add
> and remove fairly easily if sql savvy.  I have long wanted to add a "manual
> edit" panel to the dictionary gui, but haven't had the time.  If anybody
> else would like to work on such a tool that would be tonic.
>
> Sean
>
>
> ________________________________________
> From: Harish Kulkarni <ha...@gmail.com>
> Sent: Saturday, June 15, 2019 5:16 PM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in dictionary built with dictionaryBuilder and
> sno_rx16ab from sourceforge [EXTERNAL]
>
> unsubscribe
>
> On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Yes, I agree it would be nice because the tokenization that occurs when
> > creating the dictionaries from the releases make comparisons a bit tricky
> > and is not 100% reversible. I would love to hear an answer to your
> > quandary.
> >
> >      Remy
> >
> > On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com>
> wrote:
> >
> > > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > > dictionary had put the differences applied to the default UMLS output
> > into
> > > version control in some form. I imagine the
> > > additions/synonyms/abbreviations that were added manually must have
> been
> > > collected over time somewhere prior to merging them with 2016ab UMLS
> > > release? I basically want to recreate the default cTAKES 4.0.0 release
> > with
> > > an additional ontology and the latest terms. I can likely come up with
> a
> > > diff myself but was wondering if this was already maintained as part of
> > > cTAKES.
> > >
> > > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <remys@foreseemed.com
> >
> > > wrote:
> > >
> > > > Yes, that's pretty much what we do too. Not only to enhance the
> > > dictionary,
> > > > but to put in corrections because, lo and behold, there are some
> errors
> > > in
> > > > there!. As you know, an ontology is a constant curation job and that
> > > > script, under SCM, allows you to isolate those changes and, if
> > necessary,
> > > > re-apply them to new versions.
> > > >
> > > >       Remy
> > > >
> > > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <
> gandhirajan.n@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jeff,
> > > > >
> > > > > As far as I know, maintaining a separate SQL script to add
> additional
> > > > > entries should work seamlessly.
> > > > >
> > > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> > wrote:
> > > > >
> > > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > > modifications/synonyms are tracked anywhere (aside from the
> > > dictionary
> > > > > > itself) so they can be carried forward in future dictionary
> > updates?
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > > remys@foreseemed.com>
> > > > > > wrote:
> > > > > >
> > > > > > > From my experience, it seems pretty obvious that sno_rx_16ab
> is a
> > > > > curated
> > > > > > > dictionary based on the SNOMED 2016AB release. It does not
> > contain
> > > > the
> > > > > > full
> > > > > > > set but it has additional edits and synonyms that are pretty
> > useful
> > > > > > > (including 'dm').
> > > > > > >
> > > > > > > We have had to manage those mods as an adjunct.
> > > > > > >
> > > > > > >       Remy
> > > > > > >
> > > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> > jeffmax@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > > I have created a custom dictionary from the latest UMLS
> release
> > > > with
> > > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > > generating
> > > > > > > .script
> > > > > > > > file with unexpected differences as compared to the
> sno_rx_16ab
> > > > file
> > > > > > > > available as part of the cTAKES release. Specifically, for
> > > > diabetes,
> > > > > it
> > > > > > > is
> > > > > > > > missing these two rows:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > > >
> > > > > > > > and only has this one:
> > > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > > mellitus','mellitus')
> > > > > > > >
> > > > > > > > The end result is that "diabetes" is not being picked up in
> the
> > > > test
> > > > > > > text I
> > > > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > > > >
> > > > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > > > dictionary
> > > > > > > > creator that could account for missing alternative forms like
> > > this?
> > > > > > I've
> > > > > > > > tried downloading the 2016AB release (which I think is the
> one
> > > used
> > > > > to
> > > > > > > > create the bundled sno_rx_16ab package?) and I am not getting
> > the
> > > > > > > alternate
> > > > > > > > forms in that dictionary either.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jeff
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Gandhi
> > > > >
> > > > > "The best way to find urself is to lose urself in the service of
> > others
> > > > > !!!"
> > > > >
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi all,

The contents of the sno_rx_16ab are a dump of the umls 2016AB snomed and rxnorm terms with certain symantic types.  Nothing was added, but synonyms are filtered based upon various rules.  For instance, unnecessary suffixes are removed ("Wart (Finding)" -> "Wart"), really long terms are excluded ("can walk straight line with only minimal assistance"), terms with dose or form are ignored and so forth.

Some filters can be changed by adding/removing from prefix/suffix/contains lists in plaintext files or by modifying the dictionary creator code.

There was no manual curation (or nothing major).  As Remy mentioned that requires a lot of attention and time.  The dictionary database was not intended to be perfect, just as good as possible without major investment - and reproducible with updates to the umls.

As the dictionary is released as a sql database, you should be able to add and remove fairly easily if sql savvy.  I have long wanted to add a "manual edit" panel to the dictionary gui, but haven't had the time.  If anybody else would like to work on such a tool that would be tonic.

Sean


________________________________________
From: Harish Kulkarni <ha...@gmail.com>
Sent: Saturday, June 15, 2019 5:16 PM
To: dev@ctakes.apache.org
Subject: Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge [EXTERNAL]

unsubscribe

On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Yes, I agree it would be nice because the tokenization that occurs when
> creating the dictionaries from the releases make comparisons a bit tricky
> and is not 100% reversible. I would love to hear an answer to your
> quandary.
>
>      Remy
>
> On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com> wrote:
>
> > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > dictionary had put the differences applied to the default UMLS output
> into
> > version control in some form. I imagine the
> > additions/synonyms/abbreviations that were added manually must have been
> > collected over time somewhere prior to merging them with 2016ab UMLS
> > release? I basically want to recreate the default cTAKES 4.0.0 release
> with
> > an additional ontology and the latest terms. I can likely come up with a
> > diff myself but was wondering if this was already maintained as part of
> > cTAKES.
> >
> > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > Yes, that's pretty much what we do too. Not only to enhance the
> > dictionary,
> > > but to put in corrections because, lo and behold, there are some errors
> > in
> > > there!. As you know, an ontology is a constant curation job and that
> > > script, under SCM, allows you to isolate those changes and, if
> necessary,
> > > re-apply them to new versions.
> > >
> > >       Remy
> > >
> > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <ga...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > As far as I know, maintaining a separate SQL script to add additional
> > > > entries should work seamlessly.
> > > >
> > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> wrote:
> > > >
> > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > modifications/synonyms are tracked anywhere (aside from the
> > dictionary
> > > > > itself) so they can be carried forward in future dictionary
> updates?
> > > > >
> > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > remys@foreseemed.com>
> > > > > wrote:
> > > > >
> > > > > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> > > > curated
> > > > > > dictionary based on the SNOMED 2016AB release. It does not
> contain
> > > the
> > > > > full
> > > > > > set but it has additional edits and synonyms that are pretty
> useful
> > > > > > (including 'dm').
> > > > > >
> > > > > > We have had to manage those mods as an adjunct.
> > > > > >
> > > > > >       Remy
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I have created a custom dictionary from the latest UMLS release
> > > with
> > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > generating
> > > > > > .script
> > > > > > > file with unexpected differences as compared to the sno_rx_16ab
> > > file
> > > > > > > available as part of the cTAKES release. Specifically, for
> > > diabetes,
> > > > it
> > > > > > is
> > > > > > > missing these two rows:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > >
> > > > > > > and only has this one:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > mellitus','mellitus')
> > > > > > >
> > > > > > > The end result is that "diabetes" is not being picked up in the
> > > test
> > > > > > text I
> > > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > > >
> > > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > > dictionary
> > > > > > > creator that could account for missing alternative forms like
> > this?
> > > > > I've
> > > > > > > tried downloading the 2016AB release (which I think is the one
> > used
> > > > to
> > > > > > > create the bundled sno_rx_16ab package?) and I am not getting
> the
> > > > > > alternate
> > > > > > > forms in that dictionary either.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jeff
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Gandhi
> > > >
> > > > "The best way to find urself is to lose urself in the service of
> others
> > > > !!!"
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Harish Kulkarni <ha...@gmail.com>.
unsubscribe

On Sat, Jun 15, 2019 at 1:40 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Yes, I agree it would be nice because the tokenization that occurs when
> creating the dictionaries from the releases make comparisons a bit tricky
> and is not 100% reversible. I would love to hear an answer to your
> quandary.
>
>      Remy
>
> On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com> wrote:
>
> > Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> > dictionary had put the differences applied to the default UMLS output
> into
> > version control in some form. I imagine the
> > additions/synonyms/abbreviations that were added manually must have been
> > collected over time somewhere prior to merging them with 2016ab UMLS
> > release? I basically want to recreate the default cTAKES 4.0.0 release
> with
> > an additional ontology and the latest terms. I can likely come up with a
> > diff myself but was wondering if this was already maintained as part of
> > cTAKES.
> >
> > On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > Yes, that's pretty much what we do too. Not only to enhance the
> > dictionary,
> > > but to put in corrections because, lo and behold, there are some errors
> > in
> > > there!. As you know, an ontology is a constant curation job and that
> > > script, under SCM, allows you to isolate those changes and, if
> necessary,
> > > re-apply them to new versions.
> > >
> > >       Remy
> > >
> > > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <ga...@gmail.com>
> > > wrote:
> > >
> > > > Hi Jeff,
> > > >
> > > > As far as I know, maintaining a separate SQL script to add additional
> > > > entries should work seamlessly.
> > > >
> > > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com>
> wrote:
> > > >
> > > > > Thanks Remy. Does anyone know if these manually curated
> > > > > modifications/synonyms are tracked anywhere (aside from the
> > dictionary
> > > > > itself) so they can be carried forward in future dictionary
> updates?
> > > > >
> > > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> > remys@foreseemed.com>
> > > > > wrote:
> > > > >
> > > > > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> > > > curated
> > > > > > dictionary based on the SNOMED 2016AB release. It does not
> contain
> > > the
> > > > > full
> > > > > > set but it has additional edits and synonyms that are pretty
> useful
> > > > > > (including 'dm').
> > > > > >
> > > > > > We have had to manage those mods as an adjunct.
> > > > > >
> > > > > >       Remy
> > > > > >
> > > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <
> jeffmax@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > I have created a custom dictionary from the latest UMLS release
> > > with
> > > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> > generating
> > > > > > .script
> > > > > > > file with unexpected differences as compared to the sno_rx_16ab
> > > file
> > > > > > > available as part of the cTAKES release. Specifically, for
> > > diabetes,
> > > > it
> > > > > > is
> > > > > > > missing these two rows:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > > >
> > > > > > > and only has this one:
> > > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > > mellitus','mellitus')
> > > > > > >
> > > > > > > The end result is that "diabetes" is not being picked up in the
> > > test
> > > > > > text I
> > > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > > >
> > > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > > dictionary
> > > > > > > creator that could account for missing alternative forms like
> > this?
> > > > > I've
> > > > > > > tried downloading the 2016AB release (which I think is the one
> > used
> > > > to
> > > > > > > create the bundled sno_rx_16ab package?) and I am not getting
> the
> > > > > > alternate
> > > > > > > forms in that dictionary either.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jeff
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Gandhi
> > > >
> > > > "The best way to find urself is to lose urself in the service of
> others
> > > > !!!"
> > > >
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Remy Sanouillet <re...@foreseemed.com>.
Yes, I agree it would be nice because the tokenization that occurs when
creating the dictionaries from the releases make comparisons a bit tricky
and is not 100% reversible. I would love to hear an answer to your quandary.

     Remy

On Sat, Jun 15, 2019 at 1:23 PM Jeffrey Miller <je...@gmail.com> wrote:

> Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
> dictionary had put the differences applied to the default UMLS output into
> version control in some form. I imagine the
> additions/synonyms/abbreviations that were added manually must have been
> collected over time somewhere prior to merging them with 2016ab UMLS
> release? I basically want to recreate the default cTAKES 4.0.0 release with
> an additional ontology and the latest terms. I can likely come up with a
> diff myself but was wondering if this was already maintained as part of
> cTAKES.
>
> On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > Yes, that's pretty much what we do too. Not only to enhance the
> dictionary,
> > but to put in corrections because, lo and behold, there are some errors
> in
> > there!. As you know, an ontology is a constant curation job and that
> > script, under SCM, allows you to isolate those changes and, if necessary,
> > re-apply them to new versions.
> >
> >       Remy
> >
> > On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <ga...@gmail.com>
> > wrote:
> >
> > > Hi Jeff,
> > >
> > > As far as I know, maintaining a separate SQL script to add additional
> > > entries should work seamlessly.
> > >
> > > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com> wrote:
> > >
> > > > Thanks Remy. Does anyone know if these manually curated
> > > > modifications/synonyms are tracked anywhere (aside from the
> dictionary
> > > > itself) so they can be carried forward in future dictionary updates?
> > > >
> > > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <
> remys@foreseemed.com>
> > > > wrote:
> > > >
> > > > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> > > curated
> > > > > dictionary based on the SNOMED 2016AB release. It does not contain
> > the
> > > > full
> > > > > set but it has additional edits and synonyms that are pretty useful
> > > > > (including 'dm').
> > > > >
> > > > > We have had to manage those mods as an adjunct.
> > > > >
> > > > >       Remy
> > > > >
> > > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > > I have created a custom dictionary from the latest UMLS release
> > with
> > > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be
> generating
> > > > > .script
> > > > > > file with unexpected differences as compared to the sno_rx_16ab
> > file
> > > > > > available as part of the cTAKES release. Specifically, for
> > diabetes,
> > > it
> > > > > is
> > > > > > missing these two rows:
> > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > > >
> > > > > > and only has this one:
> > > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > > mellitus','mellitus')
> > > > > >
> > > > > > The end result is that "diabetes" is not being picked up in the
> > test
> > > > > text I
> > > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > > >
> > > > > > Is there any setting on the UMLS install side or the ctTAKES
> > > dictionary
> > > > > > creator that could account for missing alternative forms like
> this?
> > > > I've
> > > > > > tried downloading the 2016AB release (which I think is the one
> used
> > > to
> > > > > > create the bundled sno_rx_16ab package?) and I am not getting the
> > > > > alternate
> > > > > > forms in that dictionary either.
> > > > > >
> > > > > > Thanks,
> > > > > > Jeff
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Gandhi
> > >
> > > "The best way to find urself is to lose urself in the service of others
> > > !!!"
> > >
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Jeffrey Miller <je...@gmail.com>.
Thanks, I was curious if the cTAKES devs that created the sno_rx_16ab
dictionary had put the differences applied to the default UMLS output into
version control in some form. I imagine the
additions/synonyms/abbreviations that were added manually must have been
collected over time somewhere prior to merging them with 2016ab UMLS
release? I basically want to recreate the default cTAKES 4.0.0 release with
an additional ontology and the latest terms. I can likely come up with a
diff myself but was wondering if this was already maintained as part of
cTAKES.

On Sat, Jun 15, 2019 at 12:24 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> Yes, that's pretty much what we do too. Not only to enhance the dictionary,
> but to put in corrections because, lo and behold, there are some errors in
> there!. As you know, an ontology is a constant curation job and that
> script, under SCM, allows you to isolate those changes and, if necessary,
> re-apply them to new versions.
>
>       Remy
>
> On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <ga...@gmail.com>
> wrote:
>
> > Hi Jeff,
> >
> > As far as I know, maintaining a separate SQL script to add additional
> > entries should work seamlessly.
> >
> > On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com> wrote:
> >
> > > Thanks Remy. Does anyone know if these manually curated
> > > modifications/synonyms are tracked anywhere (aside from the dictionary
> > > itself) so they can be carried forward in future dictionary updates?
> > >
> > > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <re...@foreseemed.com>
> > > wrote:
> > >
> > > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> > curated
> > > > dictionary based on the SNOMED 2016AB release. It does not contain
> the
> > > full
> > > > set but it has additional edits and synonyms that are pretty useful
> > > > (including 'dm').
> > > >
> > > > We have had to manage those mods as an adjunct.
> > > >
> > > >       Remy
> > > >
> > > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > > I have created a custom dictionary from the latest UMLS release
> with
> > > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> > > > .script
> > > > > file with unexpected differences as compared to the sno_rx_16ab
> file
> > > > > available as part of the cTAKES release. Specifically, for
> diabetes,
> > it
> > > > is
> > > > > missing these two rows:
> > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > > >
> > > > > and only has this one:
> > > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> > mellitus','mellitus')
> > > > >
> > > > > The end result is that "diabetes" is not being picked up in the
> test
> > > > text I
> > > > > am running through- it requires the full 'diabetes mellitus'.
> > > > >
> > > > > Is there any setting on the UMLS install side or the ctTAKES
> > dictionary
> > > > > creator that could account for missing alternative forms like this?
> > > I've
> > > > > tried downloading the 2016AB release (which I think is the one used
> > to
> > > > > create the bundled sno_rx_16ab package?) and I am not getting the
> > > > alternate
> > > > > forms in that dictionary either.
> > > > >
> > > > > Thanks,
> > > > > Jeff
> > > > >
> > > >
> > >
> >
> >
> > --
> > Regards,
> > Gandhi
> >
> > "The best way to find urself is to lose urself in the service of others
> > !!!"
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Remy Sanouillet <re...@foreseemed.com>.
Yes, that's pretty much what we do too. Not only to enhance the dictionary,
but to put in corrections because, lo and behold, there are some errors in
there!. As you know, an ontology is a constant curation job and that
script, under SCM, allows you to isolate those changes and, if necessary,
re-apply them to new versions.

      Remy

On Sat, Jun 15, 2019 at 8:36 AM gandhi rajan <ga...@gmail.com>
wrote:

> Hi Jeff,
>
> As far as I know, maintaining a separate SQL script to add additional
> entries should work seamlessly.
>
> On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com> wrote:
>
> > Thanks Remy. Does anyone know if these manually curated
> > modifications/synonyms are tracked anywhere (aside from the dictionary
> > itself) so they can be carried forward in future dictionary updates?
> >
> > On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <re...@foreseemed.com>
> > wrote:
> >
> > > From my experience, it seems pretty obvious that sno_rx_16ab is a
> curated
> > > dictionary based on the SNOMED 2016AB release. It does not contain the
> > full
> > > set but it has additional edits and synonyms that are pretty useful
> > > (including 'dm').
> > >
> > > We have had to manage those mods as an adjunct.
> > >
> > >       Remy
> > >
> > > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > > I have created a custom dictionary from the latest UMLS release with
> > > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> > > .script
> > > > file with unexpected differences as compared to the sno_rx_16ab file
> > > > available as part of the cTAKES release. Specifically, for diabetes,
> it
> > > is
> > > > missing these two rows:
> > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > > >
> > > > and only has this one:
> > > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes
> mellitus','mellitus')
> > > >
> > > > The end result is that "diabetes" is not being picked up in the test
> > > text I
> > > > am running through- it requires the full 'diabetes mellitus'.
> > > >
> > > > Is there any setting on the UMLS install side or the ctTAKES
> dictionary
> > > > creator that could account for missing alternative forms like this?
> > I've
> > > > tried downloading the 2016AB release (which I think is the one used
> to
> > > > create the bundled sno_rx_16ab package?) and I am not getting the
> > > alternate
> > > > forms in that dictionary either.
> > > >
> > > > Thanks,
> > > > Jeff
> > > >
> > >
> >
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by gandhi rajan <ga...@gmail.com>.
Hi Jeff,

As far as I know, maintaining a separate SQL script to add additional
entries should work seamlessly.

On Saturday, June 15, 2019, Jeffrey Miller <je...@gmail.com> wrote:

> Thanks Remy. Does anyone know if these manually curated
> modifications/synonyms are tracked anywhere (aside from the dictionary
> itself) so they can be carried forward in future dictionary updates?
>
> On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <re...@foreseemed.com>
> wrote:
>
> > From my experience, it seems pretty obvious that sno_rx_16ab is a curated
> > dictionary based on the SNOMED 2016AB release. It does not contain the
> full
> > set but it has additional edits and synonyms that are pretty useful
> > (including 'dm').
> >
> > We have had to manage those mods as an adjunct.
> >
> >       Remy
> >
> > On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com>
> wrote:
> >
> > > Hi,
> > > I have created a custom dictionary from the latest UMLS release with
> > > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> > .script
> > > file with unexpected differences as compared to the sno_rx_16ab file
> > > available as part of the cTAKES release. Specifically, for diabetes, it
> > is
> > > missing these two rows:
> > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> > >
> > > and only has this one:
> > > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
> > >
> > > The end result is that "diabetes" is not being picked up in the test
> > text I
> > > am running through- it requires the full 'diabetes mellitus'.
> > >
> > > Is there any setting on the UMLS install side or the ctTAKES dictionary
> > > creator that could account for missing alternative forms like this?
> I've
> > > tried downloading the 2016AB release (which I think is the one used to
> > > create the bundled sno_rx_16ab package?) and I am not getting the
> > alternate
> > > forms in that dictionary either.
> > >
> > > Thanks,
> > > Jeff
> > >
> >
>


-- 
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Jeffrey Miller <je...@gmail.com>.
Thanks Remy. Does anyone know if these manually curated
modifications/synonyms are tracked anywhere (aside from the dictionary
itself) so they can be carried forward in future dictionary updates?

On Fri, Jun 14, 2019 at 4:28 PM Remy Sanouillet <re...@foreseemed.com>
wrote:

> From my experience, it seems pretty obvious that sno_rx_16ab is a curated
> dictionary based on the SNOMED 2016AB release. It does not contain the full
> set but it has additional edits and synonyms that are pretty useful
> (including 'dm').
>
> We have had to manage those mods as an adjunct.
>
>       Remy
>
> On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com> wrote:
>
> > Hi,
> > I have created a custom dictionary from the latest UMLS release with
> > SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating
> .script
> > file with unexpected differences as compared to the sno_rx_16ab file
> > available as part of the cTAKES release. Specifically, for diabetes, it
> is
> > missing these two rows:
> > INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> > INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
> >
> > and only has this one:
> > INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
> >
> > The end result is that "diabetes" is not being picked up in the test
> text I
> > am running through- it requires the full 'diabetes mellitus'.
> >
> > Is there any setting on the UMLS install side or the ctTAKES dictionary
> > creator that could account for missing alternative forms like this? I've
> > tried downloading the 2016AB release (which I think is the one used to
> > create the bundled sno_rx_16ab package?) and I am not getting the
> alternate
> > forms in that dictionary either.
> >
> > Thanks,
> > Jeff
> >
>

Re: Differences in dictionary built with dictionaryBuilder and sno_rx16ab from sourceforge

Posted by Remy Sanouillet <re...@foreseemed.com>.
From my experience, it seems pretty obvious that sno_rx_16ab is a curated
dictionary based on the SNOMED 2016AB release. It does not contain the full
set but it has additional edits and synonyms that are pretty useful
(including 'dm').

We have had to manage those mods as an adjunct.

      Remy

On Fri, Jun 14, 2019 at 1:03 PM Jeffrey Miller <je...@gmail.com> wrote:

> Hi,
> I have created a custom dictionary from the latest UMLS release with
> SNOMEDCT_US and  RxNorm and I've noticed it seems to be generating .script
> file with unexpected differences as compared to the sno_rx_16ab file
> available as part of the cTAKES release. Specifically, for diabetes, it is
> missing these two rows:
> INSERT INTO CUI_TERMS VALUES(11849,0,1,'dm','dm')
> INSERT INTO CUI_TERMS VALUES(11849,0,1,'diabetes','diabetes')
>
> and only has this one:
> INSERT INTO CUI_TERMS VALUES(11849,1,2,'diabetes mellitus','mellitus')
>
> The end result is that "diabetes" is not being picked up in the test text I
> am running through- it requires the full 'diabetes mellitus'.
>
> Is there any setting on the UMLS install side or the ctTAKES dictionary
> creator that could account for missing alternative forms like this? I've
> tried downloading the 2016AB release (which I think is the one used to
> create the bundled sno_rx_16ab package?) and I am not getting the alternate
> forms in that dictionary either.
>
> Thanks,
> Jeff
>