You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2015/12/08 22:22:23 UTC

RE: ctakes with icd10; 2015 versions available on sourceforge!

Hi Brandon, thanks for finding and forwarding the instructions!

I have checked in two new hsqldb dictionaries, both from the 2015AB version of the UMLS.  They both have codes for snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.

One uses cuis filtered by snomed and rxnorm, the other adds cuis filtered by icd9 and icd10.
What this means:  Cuis that exist for a [filter source] are added to the dictionary, as are all text variations from all sources that contain that cui.  Both dictionaries also use the standard ctakes semantic group tui filters.

The names are ctakessnorx2015 and ctakesicd2015

The snomed rxnorm :  
http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx2015/

The snomed rxnorm icd9 icd10:
http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

The svn root for the whole ugly thing is:
 svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk

Stats:
ctakessnorx2015
545,913 Terms
229,251 Concepts (Cuis)
272,987 Snomed codes
32,419 Rxnorm codes
11,321 icd9 codes
61 icd10 codes

Ctakesicd2015
611,230 Terms
282,211 Concepts
18,626 icd9 codes
45,818 icd10 codes
Snomed and Rxnorm counts are the same

So, adding the icd filters gave us an extra ~53,000 concepts and ~65,000 terms.

I would like to move this all to a better root (not ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to write directly in trunk (??) and need to get moving on to other things.

There is help on the ctakes wiki: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
Though I should probably add a few items ...


Sean


-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] 
Sent: Tuesday, December 08, 2015 12:51 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10

Not to perpetuate the instructions again but I sent these out not long ago when I was going through the process and Sean was helping me.

	1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US"
	2. Copy ctakesumls.properties and ctakesumls.script from memdbtemplate to location to put new UMLS DB
	3. Run DictionaryCreator2
	java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
	4. Run CodeMapCreator
	java -cp dictionarytool.jar;lib/* org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
	5. Copy new DB files to new location and create a copy of cTakesHsql.xml and update dictionary location

Thanks,
Brandon

-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com]
Sent: Tuesday, December 08, 2015 12:47 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10

This seems like a pretty common request and with such an old version of UMLS database shipped with cTAKES it's only going to get worse. I've been wanting to build a dictionary using the latest UMLS release (as well as a custom database), so would be happy to write up the steps as I go through it. That assumes that I can dig up the instructions in the dev list.

- Dave

On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> The -shortest- answer is that you'll need to run the dictionary 
> creation tool.  There are instructions in older devlist threads.  By 
> default the dictionary creation tool does add icd9 and icd10 tables to the dictionary.
> The problem is that in Umls 2011AB those codes weren't very well 
> populated.  The 2015AB icd# set is much more rich so those tables 
> should be pretty good.  Then in ctakes you would look up annotations 
> by icd9 or icd10 codes instead of by cui:
> OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code 
> ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>
> Sean
>
> -----Original Message-----
> From: Savova, Guergana [mailto:Guergana.Savova@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 12:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Hi Alaa,
> You need to create a resource off the terminology/ontology you want to 
> use (in this case ICD9 or ICD10). Then run that resource with cTAKES 
> for the fast dictionary lookup. There is cTAKES code and some 
> documentation on how to create that resource. By default, cTAKES runs 
> with a resource created from the English version of SNOMED CT and RxNORM.
> Hope this helps.
> --Guergana
>
> -----Original Message-----
> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> Sent: Tuesday, December 8, 2015 10:01 AM
> To: dev@ctakes.apache.org
> Subject: ctakes with icd10
>
> Hi,
>
> I downloaded Latest umls version, and I want to know how to make 
> ctakes work with icd10 and icd9.
>
>
> Thanks
>


IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
I would definitely be happy to work with you!
I've sent some questions to your personal email to prevent devlist spamming.

Sean
-----Original Message-----
From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu] 
Sent: Tuesday, December 08, 2015 10:36 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!

I'd be interested in contributing to making the dictionary tool more user friendly with a GUI.

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
Sent: Tuesday, December 08, 2015 6:12 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!

Hi Dave,

I'm always happy to see interest in our stuff!

>Step 1
I built the tool to be able to build a dictionary using anything in the umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a problem.  You just add it to the CtakesSources file (or create an alternate file and point to it with -src).  To answer another of your questions, there can be zero or more sources - you saw snomedct and snomedct_us (each valid in a different umls version).  
It also can include any semantic type, just add (or remove) the appropriate tuis in a different data file.

>Step 2
You have it right - you copy the templates to another location and output to that location.  Otherwise you 'lose' your templates.

>Step 3 and 4
The jar is built from source.  I need to (soon) check in updates to the source, and at the same time I can check in a default prebuilt .jar  The lib/ directory is in the source repository.

Various people have toyed with the idea of putting the tool into a ctakes module, putting it into an "installation package", making a gui ...  The best option (imo) is probably to make an easy to use gui and keep a pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do that ...

Sean


-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com]
Sent: Tuesday, December 08, 2015 4:57 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thanks, Sean! It's great that cTAKES may soon have an up to date database out of the box. Hopefully it will cut down on the need for many to build their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in veterinary medicine so I need to add in the veterinary extension for SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate so others don't have to search for them. Not sure what it means to move them to "location to put new UMLS DB". Does that mean move them into a new directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't exist. Does one need to build that somehow from the source before running it? The command line also adds "lib/*" to the classpath. Is that the lib directory inside the dictionarytool source code or some other location?

What else would I need to do to include the SNOMED-CT Veterinary Extension along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do I'd be happy to write up an easy to follow tutorial for building a custom dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code itself? Like including it in the main cTAKES release? It seems there is demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB 
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> filtered by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to 
> the dictionary, as are all text variations from all sources that 
> contain that cui.  Both dictionaries also use the standard ctakes 
> semantic group tui filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> oS1Gav7r2A&e=
>
> The snomed rxnorm icd9 icd10:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> w7EdYgKA&e=
>
> The svn root for the whole ugly thing is:
>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and
> ~65,000 terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to 
> write directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long 
> ago when I was going through the process and Sean was helping me.
>
>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
> "SNOMEDCT_US"
>         2. Copy ctakesumls.properties and ctakesumls.script from 
> memdbtemplate to location to put new UMLS DB
>         3. Run DictionaryCreator2
>         java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         4. Run CodeMapCreator
>         java -cp dictionarytool.jar;lib/* 
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         5. Copy new DB files to new location and create a copy of 
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 12:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10
>
> This seems like a pretty common request and with such an old version 
> of UMLS database shipped with cTAKES it's only going to get worse.
> I've been wanting to build a dictionary using the latest UMLS release 
> (as well as a custom database), so would be happy to write up the 
> steps as I go through it. That assumes that I can dig up the instructions in the dev list.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Alaa,
> >
> > The -shortest- answer is that you'll need to run the dictionary 
> > creation tool.  There are instructions in older devlist threads.  By 
> > default the dictionary creation tool does add icd9 and icd10 tables 
> > to
> the dictionary.
> > The problem is that in Umls 2011AB those codes weren't very well 
> > populated.  The 2015AB icd# set is much more rich so those tables 
> > should be pretty good.  Then in ctakes you would look up annotations 
> > by icd9 or icd10 codes instead of by cui:
> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code 
> > );
> >
> > Sean
> >
> > -----Original Message-----
> > From: Savova, Guergana
> > [mailto:Guergana.Savova@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 12:17 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Hi Alaa,
> > You need to create a resource off the terminology/ontology you want 
> > to use (in this case ICD9 or ICD10). Then run that resource with 
> > cTAKES for the fast dictionary lookup. There is cTAKES code and some 
> > documentation on how to create that resource. By default, cTAKES 
> > runs with a resource created from the English version of SNOMED CT and RxNORM.
> > Hope this helps.
> > --Guergana
> >
> > -----Original Message-----
> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > Sent: Tuesday, December 8, 2015 10:01 AM
> > To: dev@ctakes.apache.org
> > Subject: ctakes with icd10
> >
> > Hi,
> >
> > I downloaded Latest umls version, and I want to know how to make 
> > ctakes work with icd10 and icd9.
> >
> >
> > Thanks
> >
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Alaa,

Sorry for the late reply, I'll try to get you something a little more useful in the next few days.

Sean

-----Original Message-----
From: Alaa al Barari [mailto:alaa.albarari@gmail.com] 
Sent: Thursday, December 10, 2015 12:27 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

I am really thankful.

I believe the issue is in here    static public void writeCuiCodes( final
String termFilePath,
                                     final Map<String, Map<String, Collection<String>>> cuiCodes ) {
      final List<String> codeSources = Arrays.asList( "ICD10PCS", "ICD9CM", "RXNORM", "SNOMEDCT" );



On Thu, Dec 10, 2015 at 7:02 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> No worries, it can be pretty confusing.
>
> 1  The tool is a prototype hack.  So, it is in the svn sandbox 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
> pos_asf_ctakes_sandbox_dictionarytool&d=BQIBaQ&c=qS4goWBT7poplM69zy_3x
> hKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m
> =wBMbFil3i5cO-Hj7So0Rfeh7-Npz3u-btUXhUDQN1T0&s=146RtcAn3FCAHFfaBKzLD4r
> bCpXrMPtO-jh4jo6AfI8&e=
>
> 2  I think that you should be able to add "icd10cm" as a line in your 
> CtakesSources.txt data file.  But I should probably test it myself 
> before you waste any time on it.  If a change to the code is required 
> I can take care of it.  I will try to get to it ~10:pm eastern time.
>
> 3  The lines that you are seeing look perfect.
>
> 4  The tuis inside CtakesAnatTui.txt are the codes for the semantic 
> types that comprise the "Anatomical Site" semantic group in ctakes.  
> You should never need to change the file, but you do need to point to it.
>
> 5  Unfortunately there isn't any real documentation, just help emails 
> from myself and others on the devlist.  As I said, it is a prototype 
> and not an official tool or part of the ctakes release.
>
> Sean
>
>
>
> -----Original Message-----
> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> Sent: Thursday, December 10, 2015 10:38 AM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Hi Finan,
>
> I am sorry if I am asking too much but I am really stuck ...
>
> 1- could you please give me a link where I can download the latest 
> version of dictionarytool
> 2- The current version I have always produce for icd10pcs although I 
> have in the -src file icd10CM,  icd10pcs is statically added inside 
> dictionarytool ?  if I changed from within the code it should work ?
> 3- after running the tool lines like below are added to the .script 
> file am i on the right track ?
> INSERT INTO CUI_TERMS VALUES(20417,1,2,'hyoid bones','bones') INSERT 
> INTO CUI_TERMS VALUES(20417,0,2,'os hyoideum','os')
>
> 4- as naive as this sound but what is tui insides CtakesAnatTuis.txt?
>
> 5- any documentation you advice to read ?
>
>
> On Thu, Dec 10, 2015 at 10:37 AM, Alaa al Barari 
> <al...@gmail.com>
> wrote:
>
> > Finan, from where to download the 2015. properties from sourceforg.
> > those all ICDs and snowmed ?
> >
> > I prefer to learn how to generate my own db because I will need to 
> > create my own later on, so your help is appreciated.
> >
> > On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari 
> > <al...@gmail.com>
> > wrote:
> >
> >> Thank, but what I endup with is
> >> wrong ?
> >> On Dec 10, 2015 4:26 AM, "Finan, Sean"
> >> <Se...@childrens.harvard.edu>
> >> wrote:
> >>
> >>> Hi Alaa,
> >>>
> >>> If you downloaded the 2015 .property and .script files then you do 
> >>> not need to run the dictionary creation tool.  Those databases are 
> >>> already populated and ready to use.
> >>>
> >>> Sean
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >>> Sent: Wednesday, December 09, 2015 6:33 PM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
> >>>
> >>> so basically looks like the path had Desktop as capital thats why 
> >>> it did not work.
> >>>
> >>> I ended up having rows like this inside ctakesicd2015.scripts :
> >>>
> >>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 
> >>> mg /
> >>> 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO 
> >>> CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
> >>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS 
> >>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
> >>> injection','magnesium')
> >>>
> >>>
> >>> does this mean it worked ?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari 
> >>> <alaa.albarari@gmail.com
> >>> >
> >>> wrote:
> >>>
> >>> > Thanks Finan and Brandon, your help is appreciated a lot.
> >>> >
> >>> > I downloaded the dictionary tool from 
> >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.
> >>> > or
> >>> > g_re
> >>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=B
> >>> > QI
> >>> > BaQ&
> >>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIi
> >>> > sC
> >>> > YNYm
> >>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroH
> >>> > hM M&s= JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
> >>> > I hope its the latest and bug free.
> >>> >
> >>> >
> >>> > my running command is : java -cp ./dictionarytool.jar:lib/*
> >>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> >>> > /home/abarari/Desktop/umls/2015AB/META/ -atui 
> >>> > ./data/optional/CtakesAnatTuis.txt -db 
> >>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/cta
> >>> > ke
> >>> > sicd
> >>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src 
> >>> > ./data/small/ConversionSources.txt
> >>> > -tui ./data/optional/CtakesAllTuis.txt
> >>> >
> >>> >
> >>> >
> >>> > I am running on ubuntu by the way ... anyway under 
> >>> > /home/abarari/Desktop/dictionarytool/output/
> >>> >
> >>> > there is only
> >>> >
> >>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls 
> >>> > ctakesicd2015.log ctakesicd2015.properties  ctakesicd2015.script
> >>> >
> >>> >
> >>> > where is the database ? am I doing something wrong ? do I need 
> >>> > to create the database before executing the dictionarytool or what ?
> >>> >
> >>> >
> >>> > I found couple of issues in the dictionary tool, it does not 
> >>> > work well with relative paths.
> >>> >
> >>> >
> >>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
> >>> >
> >>> >> Brandon,
> >>> >> That sounds great!
> >>> >> Please open a Jira ticket for any contributions (anyone should 
> >>> >> be able to create a Jira account).  There are some legal items 
> >>> >> built into the ASF Jira attachments for accepting contributions/donations.
> >>> >> It will also credit the contributors with the merit appropriately.
> >>> >> Anyone who is interested can follow the Jira item. (Even better 
> >>> >> if contributions were open discussion/open development.) --Pei
> >>> >>
> >>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
> >>> >> <bd...@geisinger.edu> wrote:
> >>> >> > I'd be interested in contributing to making the dictionary 
> >>> >> > tool more
> >>> >> user friendly with a GUI.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Brandon
> >>> >> >
> >>> >> > -----Original Message-----
> >>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> >>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
> >>> >> > To: dev@ctakes.apache.org
> >>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
> >>> sourceforge!
> >>> >> >
> >>> >> > Hi Dave,
> >>> >> >
> >>> >> > I'm always happy to see interest in our stuff!
> >>> >> >
> >>> >> >>Step 1
> >>> >> > I built the tool to be able to build a dictionary using 
> >>> >> > anything in the
> >>> >> umls - snomed, icd9, hpo, etc. so using the veterinary 
> >>> >> extension shouldn't be a problem.  You just add it to the 
> >>> >> CtakesSources file (or create an alternate file and point to it 
> >>> >> with -src).  To answer another of your questions, there can be 
> >>> >> zero or more sources - you saw snomedct and snomedct_us (each 
> >>> >> valid in a
> different umls version).
> >>> >> > It also can include any semantic type, just add (or remove) 
> >>> >> > the
> >>> >> appropriate tuis in a different data file.
> >>> >> >
> >>> >> >>Step 2
> >>> >> > You have it right - you copy the templates to another 
> >>> >> > location and
> >>> >> output to that location.  Otherwise you 'lose' your templates.
> >>> >> >
> >>> >> >>Step 3 and 4
> >>> >> > The jar is built from source.  I need to (soon) check in 
> >>> >> > updates to the
> >>> >> source, and at the same time I can check in a default prebuilt 
> >>> >> .jar The lib/ directory is in the source repository.
> >>> >> >
> >>> >> > Various people have toyed with the idea of putting the tool 
> >>> >> > into a
> >>> >> ctakes module, putting it into an "installation package", 
> >>> >> making a
> >>> gui ...
> >>> >> The best option (imo) is probably to make an easy to use gui 
> >>> >> and keep a pre-built version in sandbox.  Someday, after the 
> >>> >> rainbow, maybe I'll get a chance to do that ...
> >>> >> >
> >>> >> > Sean
> >>> >> >
> >>> >> >
> >>> >> > -----Original Message-----
> >>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
> >>> >> > To: dev@ctakes.apache.org
> >>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
> >>> sourceforge!
> >>> >> >
> >>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to 
> >>> >> > date
> >>> >> database out of the box. Hopefully it will cut down on the need 
> >>> >> for many to build their own DB's. Thank you much for doing that.
> >>> >> >
> >>> >> > Unfortunately, I still will need to build a custom one for us.
> >>> >> > I work
> >>> >> in veterinary medicine so I need to add in the veterinary 
> >>> >> extension for SNOMED-CT into the database.
> >>> >> >
> >>> >> > I looked over the steps below that Brandon included and have 
> >>> >> > some
> >>> >> questions:
> >>> >> >
> >>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
> >>> "SNOMEDCT"
> >>> >> to "SNOMEDCT_US". The file that I have has two lines in it. 
> >>> >> First line is SNOMED, second line is SNOMEDCT_US. So this step 
> >>> >> doesn't
> >>> really make sense.
> >>> >> >
> >>> >> > step 2 should reference the two scripts as being in
> >>> >> resource/memdbtemplate so others don't have to search for them.
> >>> >> Not sure what it means to move them to "location to put new 
> >>> >> UMLS
> DB".
> >>> >> Does that mean move them into a new directory where the newly 
> >>> >> created UMLS DB will get written?
> >>> >> >
> >>> >> > steps 3 and 4 for running the tools reference 
> >>> >> > dictionarytool.jar which
> >>> >> doesn't exist. Does one need to build that somehow from the 
> >>> >> source before running it? The command line also adds "lib/*" to 
> >>> >> the classpath. Is that the lib directory inside the 
> >>> >> dictionarytool source code or some other location?
> >>> >> >
> >>> >> > What else would I need to do to include the SNOMED-CT 
> >>> >> > Veterinary
> >>> >> Extension along with the snomedct and rxnorm sources?
> >>> >> >
> >>> >> > I'll probably not have time to try this out for a while yet, 
> >>> >> > but when I
> >>> >> do I'd be happy to write up an easy to follow tutorial for 
> >>> >> building a custom dictionary assuming I am able to get it to work.
> >>> >> >
> >>> >> > Has anyone considered making this tool available outside of 
> >>> >> > the source
> >>> >> code itself? Like including it in the main cTAKES release? It 
> >>> >> seems there is demand for it.
> >>> >> >
> >>> >> > - Dave
> >>> >> >
> >>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> >>> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >> >
> >>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
> >>> >> >>
> >>> >> >> I have checked in two new hsqldb dictionaries, both from the 
> >>> >> >> 2015AB version of the UMLS.  They both have codes for 
> >>> >> >> snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the 
> >>> >> >> usual cui, tui, preferred term
> >>> >> mappings.
> >>> >> >>
> >>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds 
> >>> >> >> cuis filtered by icd9 and icd10.
> >>> >> >> What this means:  Cuis that exist for a [filter source] are 
> >>> >> >> added to the dictionary, as are all text variations from all 
> >>> >> >> sources that contain that cui.  Both dictionaries also use 
> >>> >> >> the standard ctakes semantic group tui filters.
> >>> >> >>
> >>> >> >> The names are ctakessnorx2015 and ctakesicd2015
> >>> >> >>
> >>> >> >> The snomed rxnorm :
> >>> >> >>
> >>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourcefo
> >>> >> >> rg
> >>> >> >> e.ne
> >>> >> >> t_p_
> >>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsn
> >>> >> >> om
> >>> >> >> ed-2
> >>> >> >> Drwo
> >>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_di
> >>> >> >> ct
> >>> >> >> iona
> >>> >> >> ry_l
> >>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3x
> >>> >> >> hK
> >>> >> >> wEW1
> >>> >> >> 4JZM
> >>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m
> >>> >> >> =S
> >>> >> >> Rqws
> >>> >> >> l3Fm
> >>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggD
> >>> >> >> LC
> >>> >> >> O-5g
> >>> >> >> ppCR
> >>> >> >> oS1Gav7r2A&e=
> >>> >> >>
> >>> >> >> The snomed rxnorm icd9 icd10:
> >>> >> >>
> >>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourcefo
> >>> >> >> rg
> >>> >> >> e.ne
> >>> >> >> t_p_
> >>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsn
> >>> >> >> om
> >>> >> >> ed-2
> >>> >> >> Drwo
> >>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_di
> >>> >> >> ct
> >>> >> >> iona
> >>> >> >> ry_l
> >>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhK
> >>> >> >> wE
> >>> >> >> W14J
> >>> >> >> ZMSd
> >>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=S
> >>> >> >> Rq
> >>> >> >> wsl3
> >>> >> >> FmuU
> >>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97q
> >>> >> >> A8
> >>> >> >> BG2G
> >>> >> >> 39Tw
> >>> >> >> w7EdYgKA&e=
> >>> >> >>
> >>> >> >> The svn root for the whole ugly thing is:
> >>> >> >>  svn checkout
> >>> >> >> svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >>> >> >>
> >>> >> >> Stats:
> >>> >> >> ctakessnorx2015
> >>> >> >> 545,913 Terms
> >>> >> >> 229,251 Concepts (Cuis)
> >>> >> >> 272,987 Snomed codes
> >>> >> >> 32,419 Rxnorm codes
> >>> >> >> 11,321 icd9 codes
> >>> >> >> 61 icd10 codes
> >>> >> >>
> >>> >> >> Ctakesicd2015
> >>> >> >> 611,230 Terms
> >>> >> >> 282,211 Concepts
> >>> >> >> 18,626 icd9 codes
> >>> >> >> 45,818 icd10 codes
> >>> >> >> Snomed and Rxnorm counts are the same
> >>> >> >>
> >>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts 
> >>> >> >> and
> >>> >> >> ~65,000 terms.
> >>> >> >>
> >>> >> >> I would like to move this all to a better root (not
> >>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't 
> >>> >> >> able to write directly in trunk (??) and need to get moving 
> >>> >> >> on to other
> >>> things.
> >>> >> >>
> >>> >> >> There is help on the ctakes wiki:
> >>> >> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> >>> >> >> org_
> >>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictio
> >>> >> >> na
> >>> >> >> ry-2
> >>> >> >> BLoo
> >>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r
> >>> >> >> =f
> >>> >> >> s67G
> >>> >> >> vlGZ
> >>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0
> >>> >> >> lE
> >>> >> >> 0pVR
> >>> >> >> kL53
> >>> >> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> >>> >> >> Though I should probably add a few items ...
> >>> >> >>
> >>> >> >>
> >>> >> >> Sean
> >>> >> >>
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> >>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
> >>> >> >> To: dev@ctakes.apache.org
> >>> >> >> Subject: RE: ctakes with icd10
> >>> >> >>
> >>> >> >> Not to perpetuate the instructions again but I sent these 
> >>> >> >> out not long ago when I was going through the process and 
> >>> >> >> Sean was helping
> >>> me.
> >>> >> >>
> >>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
> >>> >> >> to "SNOMEDCT_US"
> >>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script 
> >>> >> >> from memdbtemplate to location to put new UMLS DB
> >>> >> >>         3. Run DictionaryCreator2
> >>> >> >>         java -cp dictionarytool.jar;lib/*
> >>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> >>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> >>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>> >> >>         4. Run CodeMapCreator
> >>> >> >>         java -cp dictionarytool.jar;lib/* 
> >>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
> >>> >> "\pathToUmls\META"
> >>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
> >>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>> >> >>         5. Copy new DB files to new location and create a 
> >>> >> >> copy of cTakesHsql.xml and update dictionary location
> >>> >> >>
> >>> >> >> Thanks,
> >>> >> >> Brandon
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
> >>> >> >> To: dev@ctakes.apache.org
> >>> >> >> Subject: Re: ctakes with icd10
> >>> >> >>
> >>> >> >> This seems like a pretty common request and with such an old 
> >>> >> >> version of UMLS database shipped with cTAKES it's only going 
> >>> >> >> to
> >>> get worse.
> >>> >> >> I've been wanting to build a dictionary using the latest 
> >>> >> >> UMLS release (as well as a custom database), so would be 
> >>> >> >> happy to write up the steps as I go through it. That assumes 
> >>> >> >> that I can dig up the
> >>> >> instructions in the dev list.
> >>> >> >>
> >>> >> >> - Dave
> >>> >> >>
> >>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
> >>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >> >>
> >>> >> >> > Hi Alaa,
> >>> >> >> >
> >>> >> >> > The -shortest- answer is that you'll need to run the 
> >>> >> >> > dictionary creation tool.  There are instructions in older
> devlist threads.
> >>> >> >> > By default the dictionary creation tool does add icd9 and
> >>> >> >> > icd10 tables to
> >>> >> >> the dictionary.
> >>> >> >> > The problem is that in Umls 2011AB those codes weren't 
> >>> >> >> > very well populated.  The 2015AB icd# set is much more 
> >>> >> >> > rich so those tables should be pretty good.  Then in 
> >>> >> >> > ctakes you would look up annotations by icd9 or icd10 
> >>> >> >> > codes instead of by
> cui:
> >>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, 
> >>> >> >> > lookupWindow, icd#Code ); 
> >>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code 
> >>> >> >> > );
> >>> >> >> >
> >>> >> >> > Sean
> >>> >> >> >
> >>> >> >> > -----Original Message-----
> >>> >> >> > From: Savova, Guergana
> >>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
> >>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
> >>> >> >> > To: dev@ctakes.apache.org
> >>> >> >> > Subject: RE: ctakes with icd10
> >>> >> >> >
> >>> >> >> > Hi Alaa,
> >>> >> >> > You need to create a resource off the terminology/ontology 
> >>> >> >> > you want to use (in this case ICD9 or ICD10). Then run 
> >>> >> >> > that resource with cTAKES for the fast dictionary lookup. 
> >>> >> >> > There is cTAKES code and some documentation on how to 
> >>> >> >> > create that resource. By default, cTAKES runs with a 
> >>> >> >> > resource created from the English version of SNOMED CT
> >>> >> and RxNORM.
> >>> >> >> > Hope this helps.
> >>> >> >> > --Guergana
> >>> >> >> >
> >>> >> >> > -----Original Message-----
> >>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
> >>> >> >> > To: dev@ctakes.apache.org
> >>> >> >> > Subject: ctakes with icd10
> >>> >> >> >
> >>> >> >> > Hi,
> >>> >> >> >
> >>> >> >> > I downloaded Latest umls version, and I want to know how 
> >>> >> >> > to make ctakes work with icd10 and icd9.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > Thanks
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> IMPORTANT WARNING: The information in this message (and the 
> >>> >> >> documents attached to it, if any) is confidential and may be
> >>> legally privileged.
> >>> >> >> It is intended solely for the addressee. Access to this 
> >>> >> >> message by anyone else is unauthorized. If you are not the 
> >>> >> >> intended recipient, any disclosure, copying, distribution or 
> >>> >> >> any action taken, or omitted to be taken, in reliance on it 
> >>> >> >> is prohibited and may be unlawful. If you have received this 
> >>> >> >> message in error, please delete all electronic copies of 
> >>> >> >> this message (and the documents attached to it, if any), 
> >>> >> >> destroy any hard copies you may have created and notify me 
> >>> >> >> immediately
> >>> >> by replying to this email. Thank you.
> >>> >> >>
> >>> >> >> Geisinger Health System utilizes an encryption process to 
> >>> >> >> safeguard Protected Health Information and other 
> >>> >> >> confidential data contained in external e-mail messages. If 
> >>> >> >> email is encrypted, the recipient will receive an e-mail 
> >>> >> >> instructing them to sign on to the Geisinger Health System 
> >>> >> >> Secure E-mail Message Center to
> >>> retrieve the encrypted e-mail.
> >>> >> >>
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Eng Alaa Al-Barari
> >>> > phone 0599297470
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Eng Alaa Al-Barari
> >>> phone 0599297470
> >>>
> >>
> >
> >
> > --
> > Eng Alaa Al-Barari
> > phone 0599297470
> >
>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



--
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
I am really thankful.

I believe the issue is in here    static public void writeCuiCodes( final
String termFilePath,
                                     final Map<String, Map<String,
Collection<String>>> cuiCodes ) {
      final List<String> codeSources = Arrays.asList( "ICD10PCS", "ICD9CM",
"RXNORM", "SNOMEDCT" );



On Thu, Dec 10, 2015 at 7:02 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> No worries, it can be pretty confusing.
>
> 1  The tool is a prototype hack.  So, it is in the svn sandbox
> https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool
>
> 2  I think that you should be able to add "icd10cm" as a line in your
> CtakesSources.txt data file.  But I should probably test it myself before
> you waste any time on it.  If a change to the code is required I can take
> care of it.  I will try to get to it ~10:pm eastern time.
>
> 3  The lines that you are seeing look perfect.
>
> 4  The tuis inside CtakesAnatTui.txt are the codes for the semantic types
> that comprise the "Anatomical Site" semantic group in ctakes.  You should
> never need to change the file, but you do need to point to it.
>
> 5  Unfortunately there isn't any real documentation, just help emails from
> myself and others on the devlist.  As I said, it is a prototype and not an
> official tool or part of the ctakes release.
>
> Sean
>
>
>
> -----Original Message-----
> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> Sent: Thursday, December 10, 2015 10:38 AM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Hi Finan,
>
> I am sorry if I am asking too much but I am really stuck ...
>
> 1- could you please give me a link where I can download the latest version
> of dictionarytool
> 2- The current version I have always produce for icd10pcs although I have
> in the -src file icd10CM,  icd10pcs is statically added inside
> dictionarytool ?  if I changed from within the code it should work ?
> 3- after running the tool lines like below are added to the .script file
> am i on the right track ?
> INSERT INTO CUI_TERMS VALUES(20417,1,2,'hyoid bones','bones') INSERT INTO
> CUI_TERMS VALUES(20417,0,2,'os hyoideum','os')
>
> 4- as naive as this sound but what is tui insides CtakesAnatTuis.txt?
>
> 5- any documentation you advice to read ?
>
>
> On Thu, Dec 10, 2015 at 10:37 AM, Alaa al Barari <al...@gmail.com>
> wrote:
>
> > Finan, from where to download the 2015. properties from sourceforg.
> > those all ICDs and snowmed ?
> >
> > I prefer to learn how to generate my own db because I will need to
> > create my own later on, so your help is appreciated.
> >
> > On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari
> > <al...@gmail.com>
> > wrote:
> >
> >> Thank, but what I endup with is
> >> wrong ?
> >> On Dec 10, 2015 4:26 AM, "Finan, Sean"
> >> <Se...@childrens.harvard.edu>
> >> wrote:
> >>
> >>> Hi Alaa,
> >>>
> >>> If you downloaded the 2015 .property and .script files then you do
> >>> not need to run the dictionary creation tool.  Those databases are
> >>> already populated and ready to use.
> >>>
> >>> Sean
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >>> Sent: Wednesday, December 09, 2015 6:33 PM
> >>> To: dev@ctakes.apache.org
> >>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
> >>>
> >>> so basically looks like the path had Desktop as capital thats why it
> >>> did not work.
> >>>
> >>> I ended up having rows like this inside ctakesicd2015.scripts :
> >>>
> >>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg
> >>> /
> >>> 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO
> >>> CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
> >>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
> >>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
> >>> injection','magnesium')
> >>>
> >>>
> >>> does this mean it worked ?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari
> >>> <alaa.albarari@gmail.com
> >>> >
> >>> wrote:
> >>>
> >>> > Thanks Finan and Brandon, your help is appreciated a lot.
> >>> >
> >>> > I downloaded the dictionary tool from
> >>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.or
> >>> > g_re
> >>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQI
> >>> > BaQ&
> >>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisC
> >>> > YNYm
> >>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhM
> >>> > M&s= JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
> >>> > I hope its the latest and bug free.
> >>> >
> >>> >
> >>> > my running command is : java -cp ./dictionarytool.jar:lib/*
> >>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> >>> > /home/abarari/Desktop/umls/2015AB/META/ -atui
> >>> > ./data/optional/CtakesAnatTuis.txt -db
> >>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctake
> >>> > sicd
> >>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src
> >>> > ./data/small/ConversionSources.txt
> >>> > -tui ./data/optional/CtakesAllTuis.txt
> >>> >
> >>> >
> >>> >
> >>> > I am running on ubuntu by the way ... anyway under
> >>> > /home/abarari/Desktop/dictionarytool/output/
> >>> >
> >>> > there is only
> >>> >
> >>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls
> >>> > ctakesicd2015.log ctakesicd2015.properties  ctakesicd2015.script
> >>> >
> >>> >
> >>> > where is the database ? am I doing something wrong ? do I need to
> >>> > create the database before executing the dictionarytool or what ?
> >>> >
> >>> >
> >>> > I found couple of issues in the dictionary tool, it does not work
> >>> > well with relative paths.
> >>> >
> >>> >
> >>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
> >>> >
> >>> >> Brandon,
> >>> >> That sounds great!
> >>> >> Please open a Jira ticket for any contributions (anyone should be
> >>> >> able to create a Jira account).  There are some legal items built
> >>> >> into the ASF Jira attachments for accepting contributions/donations.
> >>> >> It will also credit the contributors with the merit appropriately.
> >>> >> Anyone who is interested can follow the Jira item. (Even better
> >>> >> if contributions were open discussion/open development.) --Pei
> >>> >>
> >>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
> >>> >> <bd...@geisinger.edu> wrote:
> >>> >> > I'd be interested in contributing to making the dictionary tool
> >>> >> > more
> >>> >> user friendly with a GUI.
> >>> >> >
> >>> >> > Thanks,
> >>> >> > Brandon
> >>> >> >
> >>> >> > -----Original Message-----
> >>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> >>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
> >>> >> > To: dev@ctakes.apache.org
> >>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
> >>> sourceforge!
> >>> >> >
> >>> >> > Hi Dave,
> >>> >> >
> >>> >> > I'm always happy to see interest in our stuff!
> >>> >> >
> >>> >> >>Step 1
> >>> >> > I built the tool to be able to build a dictionary using
> >>> >> > anything in the
> >>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension
> >>> >> shouldn't be a problem.  You just add it to the CtakesSources
> >>> >> file (or create an alternate file and point to it with -src).  To
> >>> >> answer another of your questions, there can be zero or more
> >>> >> sources - you saw snomedct and snomedct_us (each valid in a
> different umls version).
> >>> >> > It also can include any semantic type, just add (or remove) the
> >>> >> appropriate tuis in a different data file.
> >>> >> >
> >>> >> >>Step 2
> >>> >> > You have it right - you copy the templates to another location
> >>> >> > and
> >>> >> output to that location.  Otherwise you 'lose' your templates.
> >>> >> >
> >>> >> >>Step 3 and 4
> >>> >> > The jar is built from source.  I need to (soon) check in
> >>> >> > updates to the
> >>> >> source, and at the same time I can check in a default prebuilt
> >>> >> .jar The lib/ directory is in the source repository.
> >>> >> >
> >>> >> > Various people have toyed with the idea of putting the tool
> >>> >> > into a
> >>> >> ctakes module, putting it into an "installation package", making
> >>> >> a
> >>> gui ...
> >>> >> The best option (imo) is probably to make an easy to use gui and
> >>> >> keep a pre-built version in sandbox.  Someday, after the rainbow,
> >>> >> maybe I'll get a chance to do that ...
> >>> >> >
> >>> >> > Sean
> >>> >> >
> >>> >> >
> >>> >> > -----Original Message-----
> >>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
> >>> >> > To: dev@ctakes.apache.org
> >>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
> >>> sourceforge!
> >>> >> >
> >>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to
> >>> >> > date
> >>> >> database out of the box. Hopefully it will cut down on the need
> >>> >> for many to build their own DB's. Thank you much for doing that.
> >>> >> >
> >>> >> > Unfortunately, I still will need to build a custom one for us.
> >>> >> > I work
> >>> >> in veterinary medicine so I need to add in the veterinary
> >>> >> extension for SNOMED-CT into the database.
> >>> >> >
> >>> >> > I looked over the steps below that Brandon included and have
> >>> >> > some
> >>> >> questions:
> >>> >> >
> >>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
> >>> "SNOMEDCT"
> >>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First
> >>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
> >>> really make sense.
> >>> >> >
> >>> >> > step 2 should reference the two scripts as being in
> >>> >> resource/memdbtemplate so others don't have to search for them.
> >>> >> Not sure what it means to move them to "location to put new UMLS
> DB".
> >>> >> Does that mean move them into a new directory where the newly
> >>> >> created UMLS DB will get written?
> >>> >> >
> >>> >> > steps 3 and 4 for running the tools reference
> >>> >> > dictionarytool.jar which
> >>> >> doesn't exist. Does one need to build that somehow from the
> >>> >> source before running it? The command line also adds "lib/*" to
> >>> >> the classpath. Is that the lib directory inside the
> >>> >> dictionarytool source code or some other location?
> >>> >> >
> >>> >> > What else would I need to do to include the SNOMED-CT
> >>> >> > Veterinary
> >>> >> Extension along with the snomedct and rxnorm sources?
> >>> >> >
> >>> >> > I'll probably not have time to try this out for a while yet,
> >>> >> > but when I
> >>> >> do I'd be happy to write up an easy to follow tutorial for
> >>> >> building a custom dictionary assuming I am able to get it to work.
> >>> >> >
> >>> >> > Has anyone considered making this tool available outside of the
> >>> >> > source
> >>> >> code itself? Like including it in the main cTAKES release? It
> >>> >> seems there is demand for it.
> >>> >> >
> >>> >> > - Dave
> >>> >> >
> >>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> >>> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >> >
> >>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
> >>> >> >>
> >>> >> >> I have checked in two new hsqldb dictionaries, both from the
> >>> >> >> 2015AB version of the UMLS.  They both have codes for
> >>> >> >> snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the
> >>> >> >> usual cui, tui, preferred term
> >>> >> mappings.
> >>> >> >>
> >>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds
> >>> >> >> cuis filtered by icd9 and icd10.
> >>> >> >> What this means:  Cuis that exist for a [filter source] are
> >>> >> >> added to the dictionary, as are all text variations from all
> >>> >> >> sources that contain that cui.  Both dictionaries also use the
> >>> >> >> standard ctakes semantic group tui filters.
> >>> >> >>
> >>> >> >> The names are ctakessnorx2015 and ctakesicd2015
> >>> >> >>
> >>> >> >> The snomed rxnorm :
> >>> >> >>
> >>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforg
> >>> >> >> e.ne
> >>> >> >> t_p_
> >>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnom
> >>> >> >> ed-2
> >>> >> >> Drwo
> >>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dict
> >>> >> >> iona
> >>> >> >> ry_l
> >>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhK
> >>> >> >> wEW1
> >>> >> >> 4JZM
> >>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=S
> >>> >> >> Rqws
> >>> >> >> l3Fm
> >>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLC
> >>> >> >> O-5g
> >>> >> >> ppCR
> >>> >> >> oS1Gav7r2A&e=
> >>> >> >>
> >>> >> >> The snomed rxnorm icd9 icd10:
> >>> >> >>
> >>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforg
> >>> >> >> e.ne
> >>> >> >> t_p_
> >>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnom
> >>> >> >> ed-2
> >>> >> >> Drwo
> >>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dict
> >>> >> >> iona
> >>> >> >> ry_l
> >>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwE
> >>> >> >> W14J
> >>> >> >> ZMSd
> >>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRq
> >>> >> >> wsl3
> >>> >> >> FmuU
> >>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8
> >>> >> >> BG2G
> >>> >> >> 39Tw
> >>> >> >> w7EdYgKA&e=
> >>> >> >>
> >>> >> >> The svn root for the whole ugly thing is:
> >>> >> >>  svn checkout
> >>> >> >> svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >>> >> >>
> >>> >> >> Stats:
> >>> >> >> ctakessnorx2015
> >>> >> >> 545,913 Terms
> >>> >> >> 229,251 Concepts (Cuis)
> >>> >> >> 272,987 Snomed codes
> >>> >> >> 32,419 Rxnorm codes
> >>> >> >> 11,321 icd9 codes
> >>> >> >> 61 icd10 codes
> >>> >> >>
> >>> >> >> Ctakesicd2015
> >>> >> >> 611,230 Terms
> >>> >> >> 282,211 Concepts
> >>> >> >> 18,626 icd9 codes
> >>> >> >> 45,818 icd10 codes
> >>> >> >> Snomed and Rxnorm counts are the same
> >>> >> >>
> >>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts
> >>> >> >> and
> >>> >> >> ~65,000 terms.
> >>> >> >>
> >>> >> >> I would like to move this all to a better root (not
> >>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able
> >>> >> >> to write directly in trunk (??) and need to get moving on to
> >>> >> >> other
> >>> things.
> >>> >> >>
> >>> >> >> There is help on the ctakes wiki:
> >>> >> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> >>> >> >> org_
> >>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictiona
> >>> >> >> ry-2
> >>> >> >> BLoo
> >>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=f
> >>> >> >> s67G
> >>> >> >> vlGZ
> >>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE
> >>> >> >> 0pVR
> >>> >> >> kL53
> >>> >> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> >>> >> >> Though I should probably add a few items ...
> >>> >> >>
> >>> >> >>
> >>> >> >> Sean
> >>> >> >>
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> >>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
> >>> >> >> To: dev@ctakes.apache.org
> >>> >> >> Subject: RE: ctakes with icd10
> >>> >> >>
> >>> >> >> Not to perpetuate the instructions again but I sent these out
> >>> >> >> not long ago when I was going through the process and Sean was
> >>> >> >> helping
> >>> me.
> >>> >> >>
> >>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
> >>> >> >> to "SNOMEDCT_US"
> >>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script
> >>> >> >> from memdbtemplate to location to put new UMLS DB
> >>> >> >>         3. Run DictionaryCreator2
> >>> >> >>         java -cp dictionarytool.jar;lib/*
> >>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> >>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> >>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>> >> >>         4. Run CodeMapCreator
> >>> >> >>         java -cp dictionarytool.jar;lib/*
> >>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
> >>> >> "\pathToUmls\META"
> >>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
> >>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>> >> >>         5. Copy new DB files to new location and create a copy
> >>> >> >> of cTakesHsql.xml and update dictionary location
> >>> >> >>
> >>> >> >> Thanks,
> >>> >> >> Brandon
> >>> >> >>
> >>> >> >> -----Original Message-----
> >>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
> >>> >> >> To: dev@ctakes.apache.org
> >>> >> >> Subject: Re: ctakes with icd10
> >>> >> >>
> >>> >> >> This seems like a pretty common request and with such an old
> >>> >> >> version of UMLS database shipped with cTAKES it's only going
> >>> >> >> to
> >>> get worse.
> >>> >> >> I've been wanting to build a dictionary using the latest UMLS
> >>> >> >> release (as well as a custom database), so would be happy to
> >>> >> >> write up the steps as I go through it. That assumes that I can
> >>> >> >> dig up the
> >>> >> instructions in the dev list.
> >>> >> >>
> >>> >> >> - Dave
> >>> >> >>
> >>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> >>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >> >>
> >>> >> >> > Hi Alaa,
> >>> >> >> >
> >>> >> >> > The -shortest- answer is that you'll need to run the
> >>> >> >> > dictionary creation tool.  There are instructions in older
> devlist threads.
> >>> >> >> > By default the dictionary creation tool does add icd9 and
> >>> >> >> > icd10 tables to
> >>> >> >> the dictionary.
> >>> >> >> > The problem is that in Umls 2011AB those codes weren't very
> >>> >> >> > well populated.  The 2015AB icd# set is much more rich so
> >>> >> >> > those tables should be pretty good.  Then in ctakes you
> >>> >> >> > would look up annotations by icd9 or icd10 codes instead of by
> cui:
> >>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas,
> >>> >> >> > lookupWindow, icd#Code );
> >>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
> >>> >> >> >
> >>> >> >> > Sean
> >>> >> >> >
> >>> >> >> > -----Original Message-----
> >>> >> >> > From: Savova, Guergana
> >>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
> >>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
> >>> >> >> > To: dev@ctakes.apache.org
> >>> >> >> > Subject: RE: ctakes with icd10
> >>> >> >> >
> >>> >> >> > Hi Alaa,
> >>> >> >> > You need to create a resource off the terminology/ontology
> >>> >> >> > you want to use (in this case ICD9 or ICD10). Then run that
> >>> >> >> > resource with cTAKES for the fast dictionary lookup. There
> >>> >> >> > is cTAKES code and some documentation on how to create that
> >>> >> >> > resource. By default, cTAKES runs with a resource created
> >>> >> >> > from the English version of SNOMED CT
> >>> >> and RxNORM.
> >>> >> >> > Hope this helps.
> >>> >> >> > --Guergana
> >>> >> >> >
> >>> >> >> > -----Original Message-----
> >>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
> >>> >> >> > To: dev@ctakes.apache.org
> >>> >> >> > Subject: ctakes with icd10
> >>> >> >> >
> >>> >> >> > Hi,
> >>> >> >> >
> >>> >> >> > I downloaded Latest umls version, and I want to know how to
> >>> >> >> > make ctakes work with icd10 and icd9.
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > Thanks
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >> IMPORTANT WARNING: The information in this message (and the
> >>> >> >> documents attached to it, if any) is confidential and may be
> >>> legally privileged.
> >>> >> >> It is intended solely for the addressee. Access to this
> >>> >> >> message by anyone else is unauthorized. If you are not the
> >>> >> >> intended recipient, any disclosure, copying, distribution or
> >>> >> >> any action taken, or omitted to be taken, in reliance on it is
> >>> >> >> prohibited and may be unlawful. If you have received this
> >>> >> >> message in error, please delete all electronic copies of this
> >>> >> >> message (and the documents attached to it, if any), destroy
> >>> >> >> any hard copies you may have created and notify me immediately
> >>> >> by replying to this email. Thank you.
> >>> >> >>
> >>> >> >> Geisinger Health System utilizes an encryption process to
> >>> >> >> safeguard Protected Health Information and other confidential
> >>> >> >> data contained in external e-mail messages. If email is
> >>> >> >> encrypted, the recipient will receive an e-mail instructing
> >>> >> >> them to sign on to the Geisinger Health System Secure E-mail
> >>> >> >> Message Center to
> >>> retrieve the encrypted e-mail.
> >>> >> >>
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Eng Alaa Al-Barari
> >>> > phone 0599297470
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Eng Alaa Al-Barari
> >>> phone 0599297470
> >>>
> >>
> >
> >
> > --
> > Eng Alaa Al-Barari
> > phone 0599297470
> >
>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



-- 
Eng Alaa Al-Barari
phone 0599297470

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Alaa,

No worries, it can be pretty confusing.  

1  The tool is a prototype hack.  So, it is in the svn sandbox
https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool

2  I think that you should be able to add "icd10cm" as a line in your CtakesSources.txt data file.  But I should probably test it myself before you waste any time on it.  If a change to the code is required I can take care of it.  I will try to get to it ~10:pm eastern time.

3  The lines that you are seeing look perfect.

4  The tuis inside CtakesAnatTui.txt are the codes for the semantic types that comprise the "Anatomical Site" semantic group in ctakes.  You should never need to change the file, but you do need to point to it.

5  Unfortunately there isn't any real documentation, just help emails from myself and others on the devlist.  As I said, it is a prototype and not an official tool or part of the ctakes release.

Sean



-----Original Message-----
From: Alaa al Barari [mailto:alaa.albarari@gmail.com] 
Sent: Thursday, December 10, 2015 10:38 AM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Hi Finan,

I am sorry if I am asking too much but I am really stuck ...

1- could you please give me a link where I can download the latest version of dictionarytool
2- The current version I have always produce for icd10pcs although I have in the -src file icd10CM,  icd10pcs is statically added inside dictionarytool ?  if I changed from within the code it should work ?
3- after running the tool lines like below are added to the .script file am i on the right track ?
INSERT INTO CUI_TERMS VALUES(20417,1,2,'hyoid bones','bones') INSERT INTO CUI_TERMS VALUES(20417,0,2,'os hyoideum','os')

4- as naive as this sound but what is tui insides CtakesAnatTuis.txt?

5- any documentation you advice to read ?


On Thu, Dec 10, 2015 at 10:37 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Finan, from where to download the 2015. properties from sourceforg. 
> those all ICDs and snowmed ?
>
> I prefer to learn how to generate my own db because I will need to 
> create my own later on, so your help is appreciated.
>
> On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari 
> <al...@gmail.com>
> wrote:
>
>> Thank, but what I endup with is
>> wrong ?
>> On Dec 10, 2015 4:26 AM, "Finan, Sean" 
>> <Se...@childrens.harvard.edu>
>> wrote:
>>
>>> Hi Alaa,
>>>
>>> If you downloaded the 2015 .property and .script files then you do 
>>> not need to run the dictionary creation tool.  Those databases are 
>>> already populated and ready to use.
>>>
>>> Sean
>>>
>>>
>>> -----Original Message-----
>>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> Sent: Wednesday, December 09, 2015 6:33 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>>>
>>> so basically looks like the path had Desktop as capital thats why it 
>>> did not work.
>>>
>>> I ended up having rows like this inside ctakesicd2015.scripts :
>>>
>>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg 
>>> /
>>> 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO 
>>> CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
>>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS 
>>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
>>> injection','magnesium')
>>>
>>>
>>> does this mean it worked ?
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari 
>>> <alaa.albarari@gmail.com
>>> >
>>> wrote:
>>>
>>> > Thanks Finan and Brandon, your help is appreciated a lot.
>>> >
>>> > I downloaded the dictionary tool from 
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.or
>>> > g_re 
>>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQI
>>> > BaQ& 
>>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisC
>>> > YNYm 
>>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhM
>>> > M&s= JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
>>> > I hope its the latest and bug free.
>>> >
>>> >
>>> > my running command is : java -cp ./dictionarytool.jar:lib/*
>>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
>>> > /home/abarari/Desktop/umls/2015AB/META/ -atui 
>>> > ./data/optional/CtakesAnatTuis.txt -db 
>>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctake
>>> > sicd
>>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src 
>>> > ./data/small/ConversionSources.txt
>>> > -tui ./data/optional/CtakesAllTuis.txt
>>> >
>>> >
>>> >
>>> > I am running on ubuntu by the way ... anyway under 
>>> > /home/abarari/Desktop/dictionarytool/output/
>>> >
>>> > there is only
>>> >
>>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls 
>>> > ctakesicd2015.log ctakesicd2015.properties  ctakesicd2015.script
>>> >
>>> >
>>> > where is the database ? am I doing something wrong ? do I need to 
>>> > create the database before executing the dictionarytool or what ?
>>> >
>>> >
>>> > I found couple of issues in the dictionary tool, it does not work 
>>> > well with relative paths.
>>> >
>>> >
>>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>>> >
>>> >> Brandon,
>>> >> That sounds great!
>>> >> Please open a Jira ticket for any contributions (anyone should be 
>>> >> able to create a Jira account).  There are some legal items built 
>>> >> into the ASF Jira attachments for accepting contributions/donations.
>>> >> It will also credit the contributors with the merit appropriately.
>>> >> Anyone who is interested can follow the Jira item. (Even better 
>>> >> if contributions were open discussion/open development.) --Pei
>>> >>
>>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>>> >> <bd...@geisinger.edu> wrote:
>>> >> > I'd be interested in contributing to making the dictionary tool 
>>> >> > more
>>> >> user friendly with a GUI.
>>> >> >
>>> >> > Thanks,
>>> >> > Brandon
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Hi Dave,
>>> >> >
>>> >> > I'm always happy to see interest in our stuff!
>>> >> >
>>> >> >>Step 1
>>> >> > I built the tool to be able to build a dictionary using 
>>> >> > anything in the
>>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension 
>>> >> shouldn't be a problem.  You just add it to the CtakesSources 
>>> >> file (or create an alternate file and point to it with -src).  To 
>>> >> answer another of your questions, there can be zero or more 
>>> >> sources - you saw snomedct and snomedct_us (each valid in a different umls version).
>>> >> > It also can include any semantic type, just add (or remove) the
>>> >> appropriate tuis in a different data file.
>>> >> >
>>> >> >>Step 2
>>> >> > You have it right - you copy the templates to another location 
>>> >> > and
>>> >> output to that location.  Otherwise you 'lose' your templates.
>>> >> >
>>> >> >>Step 3 and 4
>>> >> > The jar is built from source.  I need to (soon) check in 
>>> >> > updates to the
>>> >> source, and at the same time I can check in a default prebuilt 
>>> >> .jar The lib/ directory is in the source repository.
>>> >> >
>>> >> > Various people have toyed with the idea of putting the tool 
>>> >> > into a
>>> >> ctakes module, putting it into an "installation package", making 
>>> >> a
>>> gui ...
>>> >> The best option (imo) is probably to make an easy to use gui and 
>>> >> keep a pre-built version in sandbox.  Someday, after the rainbow, 
>>> >> maybe I'll get a chance to do that ...
>>> >> >
>>> >> > Sean
>>> >> >
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to 
>>> >> > date
>>> >> database out of the box. Hopefully it will cut down on the need 
>>> >> for many to build their own DB's. Thank you much for doing that.
>>> >> >
>>> >> > Unfortunately, I still will need to build a custom one for us. 
>>> >> > I work
>>> >> in veterinary medicine so I need to add in the veterinary 
>>> >> extension for SNOMED-CT into the database.
>>> >> >
>>> >> > I looked over the steps below that Brandon included and have 
>>> >> > some
>>> >> questions:
>>> >> >
>>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
>>> "SNOMEDCT"
>>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First 
>>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
>>> really make sense.
>>> >> >
>>> >> > step 2 should reference the two scripts as being in
>>> >> resource/memdbtemplate so others don't have to search for them. 
>>> >> Not sure what it means to move them to "location to put new UMLS DB".
>>> >> Does that mean move them into a new directory where the newly 
>>> >> created UMLS DB will get written?
>>> >> >
>>> >> > steps 3 and 4 for running the tools reference 
>>> >> > dictionarytool.jar which
>>> >> doesn't exist. Does one need to build that somehow from the 
>>> >> source before running it? The command line also adds "lib/*" to 
>>> >> the classpath. Is that the lib directory inside the 
>>> >> dictionarytool source code or some other location?
>>> >> >
>>> >> > What else would I need to do to include the SNOMED-CT 
>>> >> > Veterinary
>>> >> Extension along with the snomedct and rxnorm sources?
>>> >> >
>>> >> > I'll probably not have time to try this out for a while yet, 
>>> >> > but when I
>>> >> do I'd be happy to write up an easy to follow tutorial for 
>>> >> building a custom dictionary assuming I am able to get it to work.
>>> >> >
>>> >> > Has anyone considered making this tool available outside of the 
>>> >> > source
>>> >> code itself? Like including it in the main cTAKES release? It 
>>> >> seems there is demand for it.
>>> >> >
>>> >> > - Dave
>>> >> >
>>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>>> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >
>>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
>>> >> >>
>>> >> >> I have checked in two new hsqldb dictionaries, both from the 
>>> >> >> 2015AB version of the UMLS.  They both have codes for 
>>> >> >> snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the 
>>> >> >> usual cui, tui, preferred term
>>> >> mappings.
>>> >> >>
>>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds 
>>> >> >> cuis filtered by icd9 and icd10.
>>> >> >> What this means:  Cuis that exist for a [filter source] are 
>>> >> >> added to the dictionary, as are all text variations from all 
>>> >> >> sources that contain that cui.  Both dictionaries also use the 
>>> >> >> standard ctakes semantic group tui filters.
>>> >> >>
>>> >> >> The names are ctakessnorx2015 and ctakesicd2015
>>> >> >>
>>> >> >> The snomed rxnorm :
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforg
>>> >> >> e.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnom
>>> >> >> ed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dict
>>> >> >> iona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhK
>>> >> >> wEW1
>>> >> >> 4JZM
>>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=S
>>> >> >> Rqws
>>> >> >> l3Fm
>>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLC
>>> >> >> O-5g
>>> >> >> ppCR
>>> >> >> oS1Gav7r2A&e=
>>> >> >>
>>> >> >> The snomed rxnorm icd9 icd10:
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforg
>>> >> >> e.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnom
>>> >> >> ed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dict
>>> >> >> iona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwE
>>> >> >> W14J
>>> >> >> ZMSd
>>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRq
>>> >> >> wsl3
>>> >> >> FmuU
>>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8
>>> >> >> BG2G
>>> >> >> 39Tw
>>> >> >> w7EdYgKA&e=
>>> >> >>
>>> >> >> The svn root for the whole ugly thing is:
>>> >> >>  svn checkout 
>>> >> >> svn://svn.code.sf.net/p/ctakesresources/code/trunk
>>> >> >>
>>> >> >> Stats:
>>> >> >> ctakessnorx2015
>>> >> >> 545,913 Terms
>>> >> >> 229,251 Concepts (Cuis)
>>> >> >> 272,987 Snomed codes
>>> >> >> 32,419 Rxnorm codes
>>> >> >> 11,321 icd9 codes
>>> >> >> 61 icd10 codes
>>> >> >>
>>> >> >> Ctakesicd2015
>>> >> >> 611,230 Terms
>>> >> >> 282,211 Concepts
>>> >> >> 18,626 icd9 codes
>>> >> >> 45,818 icd10 codes
>>> >> >> Snomed and Rxnorm counts are the same
>>> >> >>
>>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts 
>>> >> >> and
>>> >> >> ~65,000 terms.
>>> >> >>
>>> >> >> I would like to move this all to a better root (not
>>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able 
>>> >> >> to write directly in trunk (??) and need to get moving on to 
>>> >> >> other
>>> things.
>>> >> >>
>>> >> >> There is help on the ctakes wiki:
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>>> >> >> org_
>>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictiona
>>> >> >> ry-2
>>> >> >> BLoo
>>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=f
>>> >> >> s67G
>>> >> >> vlGZ
>>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE
>>> >> >> 0pVR
>>> >> >> kL53 
>>> >> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>>> >> >> Though I should probably add a few items ...
>>> >> >>
>>> >> >>
>>> >> >> Sean
>>> >> >>
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: RE: ctakes with icd10
>>> >> >>
>>> >> >> Not to perpetuate the instructions again but I sent these out 
>>> >> >> not long ago when I was going through the process and Sean was 
>>> >> >> helping
>>> me.
>>> >> >>
>>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
>>> >> >> to "SNOMEDCT_US"
>>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script 
>>> >> >> from memdbtemplate to location to put new UMLS DB
>>> >> >>         3. Run DictionaryCreator2
>>> >> >>         java -cp dictionarytool.jar;lib/*
>>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
>>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         4. Run CodeMapCreator
>>> >> >>         java -cp dictionarytool.jar;lib/* 
>>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>>> >> "\pathToUmls\META"
>>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         5. Copy new DB files to new location and create a copy 
>>> >> >> of cTakesHsql.xml and update dictionary location
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Brandon
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: Re: ctakes with icd10
>>> >> >>
>>> >> >> This seems like a pretty common request and with such an old 
>>> >> >> version of UMLS database shipped with cTAKES it's only going 
>>> >> >> to
>>> get worse.
>>> >> >> I've been wanting to build a dictionary using the latest UMLS 
>>> >> >> release (as well as a custom database), so would be happy to 
>>> >> >> write up the steps as I go through it. That assumes that I can 
>>> >> >> dig up the
>>> >> instructions in the dev list.
>>> >> >>
>>> >> >> - Dave
>>> >> >>
>>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
>>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >>
>>> >> >> > Hi Alaa,
>>> >> >> >
>>> >> >> > The -shortest- answer is that you'll need to run the 
>>> >> >> > dictionary creation tool.  There are instructions in older devlist threads.
>>> >> >> > By default the dictionary creation tool does add icd9 and 
>>> >> >> > icd10 tables to
>>> >> >> the dictionary.
>>> >> >> > The problem is that in Umls 2011AB those codes weren't very 
>>> >> >> > well populated.  The 2015AB icd# set is much more rich so 
>>> >> >> > those tables should be pretty good.  Then in ctakes you 
>>> >> >> > would look up annotations by icd9 or icd10 codes instead of by cui:
>>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, 
>>> >> >> > lookupWindow, icd#Code ); 
>>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>>> >> >> >
>>> >> >> > Sean
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Savova, Guergana
>>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: RE: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi Alaa,
>>> >> >> > You need to create a resource off the terminology/ontology 
>>> >> >> > you want to use (in this case ICD9 or ICD10). Then run that 
>>> >> >> > resource with cTAKES for the fast dictionary lookup. There 
>>> >> >> > is cTAKES code and some documentation on how to create that 
>>> >> >> > resource. By default, cTAKES runs with a resource created 
>>> >> >> > from the English version of SNOMED CT
>>> >> and RxNORM.
>>> >> >> > Hope this helps.
>>> >> >> > --Guergana
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > I downloaded Latest umls version, and I want to know how to 
>>> >> >> > make ctakes work with icd10 and icd9.
>>> >> >> >
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >> IMPORTANT WARNING: The information in this message (and the 
>>> >> >> documents attached to it, if any) is confidential and may be
>>> legally privileged.
>>> >> >> It is intended solely for the addressee. Access to this 
>>> >> >> message by anyone else is unauthorized. If you are not the 
>>> >> >> intended recipient, any disclosure, copying, distribution or 
>>> >> >> any action taken, or omitted to be taken, in reliance on it is 
>>> >> >> prohibited and may be unlawful. If you have received this 
>>> >> >> message in error, please delete all electronic copies of this 
>>> >> >> message (and the documents attached to it, if any), destroy 
>>> >> >> any hard copies you may have created and notify me immediately
>>> >> by replying to this email. Thank you.
>>> >> >>
>>> >> >> Geisinger Health System utilizes an encryption process to 
>>> >> >> safeguard Protected Health Information and other confidential 
>>> >> >> data contained in external e-mail messages. If email is 
>>> >> >> encrypted, the recipient will receive an e-mail instructing 
>>> >> >> them to sign on to the Geisinger Health System Secure E-mail 
>>> >> >> Message Center to
>>> retrieve the encrypted e-mail.
>>> >> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Eng Alaa Al-Barari
>>> > phone 0599297470
>>> >
>>>
>>>
>>>
>>> --
>>> Eng Alaa Al-Barari
>>> phone 0599297470
>>>
>>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



--
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
Hi Finan,

I am sorry if I am asking too much but I am really stuck ...

1- could you please give me a link where I can download the latest version
of dictionarytool
2- The current version I have always produce for icd10pcs although I have
in the -src file icd10CM,  icd10pcs is statically added inside
dictionarytool ?  if I changed from within the code it should work ?
3- after running the tool lines like below are added to the .script file am
i on the right track ?
INSERT INTO CUI_TERMS VALUES(20417,1,2,'hyoid bones','bones')
INSERT INTO CUI_TERMS VALUES(20417,0,2,'os hyoideum','os')

4- as naive as this sound but what is tui insides CtakesAnatTuis.txt?

5- any documentation you advice to read ?


On Thu, Dec 10, 2015 at 10:37 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Finan, from where to download the 2015. properties from sourceforg. those
> all ICDs and snowmed ?
>
> I prefer to learn how to generate my own db because I will need to create
> my own later on, so your help is appreciated.
>
> On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari <al...@gmail.com>
> wrote:
>
>> Thank, but what I endup with is
>> wrong ?
>> On Dec 10, 2015 4:26 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
>> wrote:
>>
>>> Hi Alaa,
>>>
>>> If you downloaded the 2015 .property and .script files then you do not
>>> need to run the dictionary creation tool.  Those databases are already
>>> populated and ready to use.
>>>
>>> Sean
>>>
>>>
>>> -----Original Message-----
>>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> Sent: Wednesday, December 09, 2015 6:33 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>>>
>>> so basically looks like the path had Desktop as capital thats why it did
>>> not work.
>>>
>>> I ended up having rows like this inside ctakesicd2015.scripts :
>>>
>>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg /
>>> 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>>> VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
>>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
>>> injection','magnesium')
>>>
>>>
>>> does this mean it worked ?
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <alaa.albarari@gmail.com
>>> >
>>> wrote:
>>>
>>> > Thanks Finan and Brandon, your help is appreciated a lot.
>>> >
>>> > I downloaded the dictionary tool from
>>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
>>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIBaQ&
>>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYm
>>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM&s=
>>> > JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
>>> > I hope its the latest and bug free.
>>> >
>>> >
>>> > my running command is : java -cp ./dictionarytool.jar:lib/*
>>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>>> > /home/abarari/Desktop/umls/2015AB/META/ -atui
>>> > ./data/optional/CtakesAnatTuis.txt -db
>>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd
>>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src
>>> > ./data/small/ConversionSources.txt
>>> > -tui ./data/optional/CtakesAllTuis.txt
>>> >
>>> >
>>> >
>>> > I am running on ubuntu by the way ... anyway under
>>> > /home/abarari/Desktop/dictionarytool/output/
>>> >
>>> > there is only
>>> >
>>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log
>>> > ctakesicd2015.properties  ctakesicd2015.script
>>> >
>>> >
>>> > where is the database ? am I doing something wrong ? do I need to
>>> > create the database before executing the dictionarytool or what ?
>>> >
>>> >
>>> > I found couple of issues in the dictionary tool, it does not work well
>>> > with relative paths.
>>> >
>>> >
>>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>>> >
>>> >> Brandon,
>>> >> That sounds great!
>>> >> Please open a Jira ticket for any contributions (anyone should be
>>> >> able to create a Jira account).  There are some legal items built
>>> >> into the ASF Jira attachments for accepting contributions/donations.
>>> >> It will also credit the contributors with the merit appropriately.
>>> >> Anyone who is interested can follow the Jira item. (Even better if
>>> >> contributions were open discussion/open development.) --Pei
>>> >>
>>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>>> >> <bd...@geisinger.edu> wrote:
>>> >> > I'd be interested in contributing to making the dictionary tool
>>> >> > more
>>> >> user friendly with a GUI.
>>> >> >
>>> >> > Thanks,
>>> >> > Brandon
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Hi Dave,
>>> >> >
>>> >> > I'm always happy to see interest in our stuff!
>>> >> >
>>> >> >>Step 1
>>> >> > I built the tool to be able to build a dictionary using anything in
>>> >> > the
>>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension
>>> >> shouldn't be a problem.  You just add it to the CtakesSources file
>>> >> (or create an alternate file and point to it with -src).  To answer
>>> >> another of your questions, there can be zero or more sources - you
>>> >> saw snomedct and snomedct_us (each valid in a different umls version).
>>> >> > It also can include any semantic type, just add (or remove) the
>>> >> appropriate tuis in a different data file.
>>> >> >
>>> >> >>Step 2
>>> >> > You have it right - you copy the templates to another location and
>>> >> output to that location.  Otherwise you 'lose' your templates.
>>> >> >
>>> >> >>Step 3 and 4
>>> >> > The jar is built from source.  I need to (soon) check in updates to
>>> >> > the
>>> >> source, and at the same time I can check in a default prebuilt .jar
>>> >> The lib/ directory is in the source repository.
>>> >> >
>>> >> > Various people have toyed with the idea of putting the tool into a
>>> >> ctakes module, putting it into an "installation package", making a
>>> gui ...
>>> >> The best option (imo) is probably to make an easy to use gui and keep
>>> >> a pre-built version in sandbox.  Someday, after the rainbow, maybe
>>> >> I'll get a chance to do that ...
>>> >> >
>>> >> > Sean
>>> >> >
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
>>> >> > To: dev@ctakes.apache.org
>>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
>>> sourceforge!
>>> >> >
>>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>>> >> database out of the box. Hopefully it will cut down on the need for
>>> >> many to build their own DB's. Thank you much for doing that.
>>> >> >
>>> >> > Unfortunately, I still will need to build a custom one for us. I
>>> >> > work
>>> >> in veterinary medicine so I need to add in the veterinary extension
>>> >> for SNOMED-CT into the database.
>>> >> >
>>> >> > I looked over the steps below that Brandon included and have some
>>> >> questions:
>>> >> >
>>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
>>> "SNOMEDCT"
>>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First
>>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
>>> really make sense.
>>> >> >
>>> >> > step 2 should reference the two scripts as being in
>>> >> resource/memdbtemplate so others don't have to search for them. Not
>>> >> sure what it means to move them to "location to put new UMLS DB".
>>> >> Does that mean move them into a new directory where the newly created
>>> >> UMLS DB will get written?
>>> >> >
>>> >> > steps 3 and 4 for running the tools reference dictionarytool.jar
>>> >> > which
>>> >> doesn't exist. Does one need to build that somehow from the source
>>> >> before running it? The command line also adds "lib/*" to the
>>> >> classpath. Is that the lib directory inside the dictionarytool source
>>> >> code or some other location?
>>> >> >
>>> >> > What else would I need to do to include the SNOMED-CT Veterinary
>>> >> Extension along with the snomedct and rxnorm sources?
>>> >> >
>>> >> > I'll probably not have time to try this out for a while yet, but
>>> >> > when I
>>> >> do I'd be happy to write up an easy to follow tutorial for building a
>>> >> custom dictionary assuming I am able to get it to work.
>>> >> >
>>> >> > Has anyone considered making this tool available outside of the
>>> >> > source
>>> >> code itself? Like including it in the main cTAKES release? It seems
>>> >> there is demand for it.
>>> >> >
>>> >> > - Dave
>>> >> >
>>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>>> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >
>>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
>>> >> >>
>>> >> >> I have checked in two new hsqldb dictionaries, both from the
>>> >> >> 2015AB version of the UMLS.  They both have codes for snomedct_us,
>>> >> >> rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui,
>>> >> >> preferred term
>>> >> mappings.
>>> >> >>
>>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>>> >> >> filtered by icd9 and icd10.
>>> >> >> What this means:  Cuis that exist for a [filter source] are added
>>> >> >> to the dictionary, as are all text variations from all sources
>>> >> >> that contain that cui.  Both dictionaries also use the standard
>>> >> >> ctakes semantic group tui filters.
>>> >> >>
>>> >> >> The names are ctakessnorx2015 and ctakesicd2015
>>> >> >>
>>> >> >> The snomed rxnorm :
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
>>> >> >> 4JZM
>>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqws
>>> >> >> l3Fm
>>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5g
>>> >> >> ppCR
>>> >> >> oS1Gav7r2A&e=
>>> >> >>
>>> >> >> The snomed rxnorm icd9 icd10:
>>> >> >>
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>>> >> >> t_p_
>>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>>> >> >> Drwo
>>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>>> >> >> ry_l
>>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>>> >> >> ZMSd
>>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
>>> >> >> FmuU
>>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G
>>> >> >> 39Tw
>>> >> >> w7EdYgKA&e=
>>> >> >>
>>> >> >> The svn root for the whole ugly thing is:
>>> >> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>>> >> >>
>>> >> >> Stats:
>>> >> >> ctakessnorx2015
>>> >> >> 545,913 Terms
>>> >> >> 229,251 Concepts (Cuis)
>>> >> >> 272,987 Snomed codes
>>> >> >> 32,419 Rxnorm codes
>>> >> >> 11,321 icd9 codes
>>> >> >> 61 icd10 codes
>>> >> >>
>>> >> >> Ctakesicd2015
>>> >> >> 611,230 Terms
>>> >> >> 282,211 Concepts
>>> >> >> 18,626 icd9 codes
>>> >> >> 45,818 icd10 codes
>>> >> >> Snomed and Rxnorm counts are the same
>>> >> >>
>>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts and
>>> >> >> ~65,000 terms.
>>> >> >>
>>> >> >> I would like to move this all to a better root (not
>>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
>>> >> >> write directly in trunk (??) and need to get moving on to other
>>> things.
>>> >> >>
>>> >> >> There is help on the ctakes wiki:
>>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>>> >> >> org_
>>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2
>>> >> >> BLoo
>>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
>>> >> >> vlGZ
>>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVR
>>> >> >> kL53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>>> >> >> Though I should probably add a few items ...
>>> >> >>
>>> >> >>
>>> >> >> Sean
>>> >> >>
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: RE: ctakes with icd10
>>> >> >>
>>> >> >> Not to perpetuate the instructions again but I sent these out not
>>> >> >> long ago when I was going through the process and Sean was helping
>>> me.
>>> >> >>
>>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
>>> >> >> to "SNOMEDCT_US"
>>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script from
>>> >> >> memdbtemplate to location to put new UMLS DB
>>> >> >>         3. Run DictionaryCreator2
>>> >> >>         java -cp dictionarytool.jar;lib/*
>>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         4. Run CodeMapCreator
>>> >> >>         java -cp dictionarytool.jar;lib/*
>>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>>> >> "\pathToUmls\META"
>>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>> >> >>         5. Copy new DB files to new location and create a copy of
>>> >> >> cTakesHsql.xml and update dictionary location
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Brandon
>>> >> >>
>>> >> >> -----Original Message-----
>>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
>>> >> >> To: dev@ctakes.apache.org
>>> >> >> Subject: Re: ctakes with icd10
>>> >> >>
>>> >> >> This seems like a pretty common request and with such an old
>>> >> >> version of UMLS database shipped with cTAKES it's only going to
>>> get worse.
>>> >> >> I've been wanting to build a dictionary using the latest UMLS
>>> >> >> release (as well as a custom database), so would be happy to write
>>> >> >> up the steps as I go through it. That assumes that I can dig up
>>> >> >> the
>>> >> instructions in the dev list.
>>> >> >>
>>> >> >> - Dave
>>> >> >>
>>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
>>> >> >>
>>> >> >> > Hi Alaa,
>>> >> >> >
>>> >> >> > The -shortest- answer is that you'll need to run the dictionary
>>> >> >> > creation tool.  There are instructions in older devlist threads.
>>> >> >> > By default the dictionary creation tool does add icd9 and icd10
>>> >> >> > tables to
>>> >> >> the dictionary.
>>> >> >> > The problem is that in Umls 2011AB those codes weren't very well
>>> >> >> > populated.  The 2015AB icd# set is much more rich so those
>>> >> >> > tables should be pretty good.  Then in ctakes you would look up
>>> >> >> > annotations by icd9 or icd10 codes instead of by cui:
>>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>>> >> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas,
>>> >> >> > icd#Code );
>>> >> >> >
>>> >> >> > Sean
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Savova, Guergana
>>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: RE: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi Alaa,
>>> >> >> > You need to create a resource off the terminology/ontology you
>>> >> >> > want to use (in this case ICD9 or ICD10). Then run that resource
>>> >> >> > with cTAKES for the fast dictionary lookup. There is cTAKES code
>>> >> >> > and some documentation on how to create that resource. By
>>> >> >> > default, cTAKES runs with a resource created from the English
>>> >> >> > version of SNOMED CT
>>> >> and RxNORM.
>>> >> >> > Hope this helps.
>>> >> >> > --Guergana
>>> >> >> >
>>> >> >> > -----Original Message-----
>>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>>> >> >> > To: dev@ctakes.apache.org
>>> >> >> > Subject: ctakes with icd10
>>> >> >> >
>>> >> >> > Hi,
>>> >> >> >
>>> >> >> > I downloaded Latest umls version, and I want to know how to make
>>> >> >> > ctakes work with icd10 and icd9.
>>> >> >> >
>>> >> >> >
>>> >> >> > Thanks
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >> IMPORTANT WARNING: The information in this message (and the
>>> >> >> documents attached to it, if any) is confidential and may be
>>> legally privileged.
>>> >> >> It is intended solely for the addressee. Access to this message by
>>> >> >> anyone else is unauthorized. If you are not the intended
>>> >> >> recipient, any disclosure, copying, distribution or any action
>>> >> >> taken, or omitted to be taken, in reliance on it is prohibited and
>>> >> >> may be unlawful. If you have received this message in error,
>>> >> >> please delete all electronic copies of this message (and the
>>> >> >> documents attached to it, if any), destroy any hard copies you may
>>> >> >> have created and notify me immediately
>>> >> by replying to this email. Thank you.
>>> >> >>
>>> >> >> Geisinger Health System utilizes an encryption process to
>>> >> >> safeguard Protected Health Information and other confidential data
>>> >> >> contained in external e-mail messages. If email is encrypted, the
>>> >> >> recipient will receive an e-mail instructing them to sign on to
>>> >> >> the Geisinger Health System Secure E-mail Message Center to
>>> retrieve the encrypted e-mail.
>>> >> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Eng Alaa Al-Barari
>>> > phone 0599297470
>>> >
>>>
>>>
>>>
>>> --
>>> Eng Alaa Al-Barari
>>> phone 0599297470
>>>
>>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



-- 
Eng Alaa Al-Barari
phone 0599297470

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Geise, Brandon D." <bd...@geisinger.edu>.
If you have the inserts in the .scripts file, you can get it to load those, by running the sqltool.rc in ./resource/memdbtemplate.  You'll need to open and modify the sqltool.rc and change the path of the jdbc file connection string to where you have your new dictionary located.

You can then run it using something like java -cp .\lib\hsqldb_1_8_0_10.jar org.hsqldb.util.SqlTool --rcfile .\resource\memdbtemplate\sqltool.rc cTakesUmls

From there you should get a fairly familiar SQL command line interface and could run some counts on all the tables and check content.

If this all works you can then try to use the new dictionaries with cTAKES.


Hope that helps,
Brandon

-----Original Message-----
From: Alaa al Barari [mailto:alaa.albarari@gmail.com] 
Sent: Thursday, December 10, 2015 3:37 AM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Finan, from where to download the 2015. properties from sourceforg. those all ICDs and snowmed ?

I prefer to learn how to generate my own db because I will need to create my own later on, so your help is appreciated.

On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Thank, but what I endup with is
> wrong ?
> On Dec 10, 2015 4:26 AM, "Finan, Sean" 
> <Se...@childrens.harvard.edu>
> wrote:
>
>> Hi Alaa,
>>
>> If you downloaded the 2015 .property and .script files then you do 
>> not need to run the dictionary creation tool.  Those databases are 
>> already populated and ready to use.
>>
>> Sean
>>
>>
>> -----Original Message-----
>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> Sent: Wednesday, December 09, 2015 6:33 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>>
>> so basically looks like the path had Desktop as capital thats why it 
>> did not work.
>>
>> I ended up having rows like this inside ctakesicd2015.scripts :
>>
>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg 
>> / 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO 
>> CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS 
>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
>> injection','magnesium')
>>
>>
>> does this mean it worked ?
>>
>>
>>
>>
>>
>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari 
>> <al...@gmail.com>
>> wrote:
>>
>> > Thanks Finan and Brandon, your help is appreciated a lot.
>> >
>> > I downloaded the dictionary tool from 
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org
>> > _re 
>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIB
>> > aQ& 
>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCY
>> > NYm 
>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM
>> > &s= JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
>> > I hope its the latest and bug free.
>> >
>> >
>> > my running command is : java -cp ./dictionarytool.jar:lib/*
>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
>> > /home/abarari/Desktop/umls/2015AB/META/ -atui 
>> > ./data/optional/CtakesAnatTuis.txt -db 
>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakes
>> > icd
>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src 
>> > ./data/small/ConversionSources.txt
>> > -tui ./data/optional/CtakesAllTuis.txt
>> >
>> >
>> >
>> > I am running on ubuntu by the way ... anyway under 
>> > /home/abarari/Desktop/dictionarytool/output/
>> >
>> > there is only
>> >
>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls 
>> > ctakesicd2015.log ctakesicd2015.properties  ctakesicd2015.script
>> >
>> >
>> > where is the database ? am I doing something wrong ? do I need to 
>> > create the database before executing the dictionarytool or what ?
>> >
>> >
>> > I found couple of issues in the dictionary tool, it does not work 
>> > well with relative paths.
>> >
>> >
>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>> >
>> >> Brandon,
>> >> That sounds great!
>> >> Please open a Jira ticket for any contributions (anyone should be 
>> >> able to create a Jira account).  There are some legal items built 
>> >> into the ASF Jira attachments for accepting contributions/donations.
>> >> It will also credit the contributors with the merit appropriately.
>> >> Anyone who is interested can follow the Jira item. (Even better if 
>> >> contributions were open discussion/open development.) --Pei
>> >>
>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>> >> <bd...@geisinger.edu> wrote:
>> >> > I'd be interested in contributing to making the dictionary tool 
>> >> > more
>> >> user friendly with a GUI.
>> >> >
>> >> > Thanks,
>> >> > Brandon
>> >> >
>> >> > -----Original Message-----
>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
>> sourceforge!
>> >> >
>> >> > Hi Dave,
>> >> >
>> >> > I'm always happy to see interest in our stuff!
>> >> >
>> >> >>Step 1
>> >> > I built the tool to be able to build a dictionary using anything 
>> >> > in the
>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension 
>> >> shouldn't be a problem.  You just add it to the CtakesSources file 
>> >> (or create an alternate file and point to it with -src).  To 
>> >> answer another of your questions, there can be zero or more 
>> >> sources - you saw snomedct and snomedct_us (each valid in a different umls version).
>> >> > It also can include any semantic type, just add (or remove) the
>> >> appropriate tuis in a different data file.
>> >> >
>> >> >>Step 2
>> >> > You have it right - you copy the templates to another location 
>> >> > and
>> >> output to that location.  Otherwise you 'lose' your templates.
>> >> >
>> >> >>Step 3 and 4
>> >> > The jar is built from source.  I need to (soon) check in updates 
>> >> > to the
>> >> source, and at the same time I can check in a default prebuilt 
>> >> .jar The lib/ directory is in the source repository.
>> >> >
>> >> > Various people have toyed with the idea of putting the tool into 
>> >> > a
>> >> ctakes module, putting it into an "installation package", making a 
>> >> gui
>> ...
>> >> The best option (imo) is probably to make an easy to use gui and 
>> >> keep a pre-built version in sandbox.  Someday, after the rainbow, 
>> >> maybe I'll get a chance to do that ...
>> >> >
>> >> > Sean
>> >> >
>> >> >
>> >> > -----Original Message-----
>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
>> sourceforge!
>> >> >
>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>> >> database out of the box. Hopefully it will cut down on the need 
>> >> for many to build their own DB's. Thank you much for doing that.
>> >> >
>> >> > Unfortunately, I still will need to build a custom one for us. I 
>> >> > work
>> >> in veterinary medicine so I need to add in the veterinary 
>> >> extension for SNOMED-CT into the database.
>> >> >
>> >> > I looked over the steps below that Brandon included and have 
>> >> > some
>> >> questions:
>> >> >
>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
>> "SNOMEDCT"
>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First 
>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
>> really make sense.
>> >> >
>> >> > step 2 should reference the two scripts as being in
>> >> resource/memdbtemplate so others don't have to search for them. 
>> >> Not sure what it means to move them to "location to put new UMLS DB".
>> >> Does that mean move them into a new directory where the newly 
>> >> created UMLS DB will get written?
>> >> >
>> >> > steps 3 and 4 for running the tools reference dictionarytool.jar 
>> >> > which
>> >> doesn't exist. Does one need to build that somehow from the source 
>> >> before running it? The command line also adds "lib/*" to the 
>> >> classpath. Is that the lib directory inside the dictionarytool 
>> >> source code or some other location?
>> >> >
>> >> > What else would I need to do to include the SNOMED-CT Veterinary
>> >> Extension along with the snomedct and rxnorm sources?
>> >> >
>> >> > I'll probably not have time to try this out for a while yet, but 
>> >> > when I
>> >> do I'd be happy to write up an easy to follow tutorial for 
>> >> building a custom dictionary assuming I am able to get it to work.
>> >> >
>> >> > Has anyone considered making this tool available outside of the 
>> >> > source
>> >> code itself? Like including it in the main cTAKES release? It 
>> >> seems there is demand for it.
>> >> >
>> >> > - Dave
>> >> >
>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >> >
>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
>> >> >>
>> >> >> I have checked in two new hsqldb dictionaries, both from the 
>> >> >> 2015AB version of the UMLS.  They both have codes for 
>> >> >> snomedct_us, rxnorm, icd9cm and icd10pcs - as well as the usual 
>> >> >> cui, tui, preferred term
>> >> mappings.
>> >> >>
>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds 
>> >> >> cuis filtered by icd9 and icd10.
>> >> >> What this means:  Cuis that exist for a [filter source] are 
>> >> >> added to the dictionary, as are all text variations from all 
>> >> >> sources that contain that cui.  Both dictionaries also use the 
>> >> >> standard ctakes semantic group tui filters.
>> >> >>
>> >> >> The names are ctakessnorx2015 and ctakesicd2015
>> >> >>
>> >> >> The snomed rxnorm :
>> >> >>
>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge
>> >> >> .ne
>> >> >> t_p_
>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnome
>> >> >> d-2
>> >> >> Drwo
>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dicti
>> >> >> ona
>> >> >> ry_l
>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKw
>> >> >> EW1
>> >> >> 4JZM
>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SR
>> >> >> qws
>> >> >> l3Fm
>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO
>> >> >> -5g
>> >> >> ppCR
>> >> >> oS1Gav7r2A&e=
>> >> >>
>> >> >> The snomed rxnorm icd9 icd10:
>> >> >>
>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge
>> >> >> .ne
>> >> >> t_p_
>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnome
>> >> >> d-2
>> >> >> Drwo
>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dicti
>> >> >> ona
>> >> >> ry_l
>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW
>> >> >> 14J
>> >> >> ZMSd
>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqw
>> >> >> sl3
>> >> >> FmuU
>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8B
>> >> >> G2G
>> >> >> 39Tw
>> >> >> w7EdYgKA&e=
>> >> >>
>> >> >> The svn root for the whole ugly thing is:
>> >> >>  svn checkout 
>> >> >> svn://svn.code.sf.net/p/ctakesresources/code/trunk
>> >> >>
>> >> >> Stats:
>> >> >> ctakessnorx2015
>> >> >> 545,913 Terms
>> >> >> 229,251 Concepts (Cuis)
>> >> >> 272,987 Snomed codes
>> >> >> 32,419 Rxnorm codes
>> >> >> 11,321 icd9 codes
>> >> >> 61 icd10 codes
>> >> >>
>> >> >> Ctakesicd2015
>> >> >> 611,230 Terms
>> >> >> 282,211 Concepts
>> >> >> 18,626 icd9 codes
>> >> >> 45,818 icd10 codes
>> >> >> Snomed and Rxnorm counts are the same
>> >> >>
>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts 
>> >> >> and
>> >> >> ~65,000 terms.
>> >> >>
>> >> >> I would like to move this all to a better root (not
>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able 
>> >> >> to write directly in trunk (??) and need to get moving on to 
>> >> >> other
>> things.
>> >> >>
>> >> >> There is help on the ctakes wiki:
>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>> >> >> org_
>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionar
>> >> >> y-2
>> >> >> BLoo
>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs
>> >> >> 67G
>> >> >> vlGZ
>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0
>> >> >> pVR
>> >> >> kL53 
>> >> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> >> >> Though I should probably add a few items ...
>> >> >>
>> >> >>
>> >> >> Sean
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
>> >> >> To: dev@ctakes.apache.org
>> >> >> Subject: RE: ctakes with icd10
>> >> >>
>> >> >> Not to perpetuate the instructions again but I sent these out 
>> >> >> not long ago when I was going through the process and Sean was 
>> >> >> helping
>> me.
>> >> >>
>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
>> >> >> to "SNOMEDCT_US"
>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script 
>> >> >> from memdbtemplate to location to put new UMLS DB
>> >> >>         3. Run DictionaryCreator2
>> >> >>         java -cp dictionarytool.jar;lib/*
>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >> >>         4. Run CodeMapCreator
>> >> >>         java -cp dictionarytool.jar;lib/* 
>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>> >> "\pathToUmls\META"
>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >> >>         5. Copy new DB files to new location and create a copy 
>> >> >> of cTakesHsql.xml and update dictionary location
>> >> >>
>> >> >> Thanks,
>> >> >> Brandon
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
>> >> >> To: dev@ctakes.apache.org
>> >> >> Subject: Re: ctakes with icd10
>> >> >>
>> >> >> This seems like a pretty common request and with such an old 
>> >> >> version of UMLS database shipped with cTAKES it's only going to 
>> >> >> get
>> worse.
>> >> >> I've been wanting to build a dictionary using the latest UMLS 
>> >> >> release (as well as a custom database), so would be happy to 
>> >> >> write up the steps as I go through it. That assumes that I can 
>> >> >> dig up the
>> >> instructions in the dev list.
>> >> >>
>> >> >> - Dave
>> >> >>
>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >> >>
>> >> >> > Hi Alaa,
>> >> >> >
>> >> >> > The -shortest- answer is that you'll need to run the 
>> >> >> > dictionary creation tool.  There are instructions in older devlist threads.
>> >> >> > By default the dictionary creation tool does add icd9 and 
>> >> >> > icd10 tables to
>> >> >> the dictionary.
>> >> >> > The problem is that in Umls 2011AB those codes weren't very 
>> >> >> > well populated.  The 2015AB icd# set is much more rich so 
>> >> >> > those tables should be pretty good.  Then in ctakes you would 
>> >> >> > look up annotations by icd9 or icd10 codes instead of by cui:
>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
>> >> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, 
>> >> >> > icd#Code );
>> >> >> >
>> >> >> > Sean
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: Savova, Guergana
>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>> >> >> > To: dev@ctakes.apache.org
>> >> >> > Subject: RE: ctakes with icd10
>> >> >> >
>> >> >> > Hi Alaa,
>> >> >> > You need to create a resource off the terminology/ontology 
>> >> >> > you want to use (in this case ICD9 or ICD10). Then run that 
>> >> >> > resource with cTAKES for the fast dictionary lookup. There is 
>> >> >> > cTAKES code and some documentation on how to create that 
>> >> >> > resource. By default, cTAKES runs with a resource created 
>> >> >> > from the English version of SNOMED CT
>> >> and RxNORM.
>> >> >> > Hope this helps.
>> >> >> > --Guergana
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>> >> >> > To: dev@ctakes.apache.org
>> >> >> > Subject: ctakes with icd10
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I downloaded Latest umls version, and I want to know how to 
>> >> >> > make ctakes work with icd10 and icd9.
>> >> >> >
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >>
>> >> >>
>> >> >> IMPORTANT WARNING: The information in this message (and the 
>> >> >> documents attached to it, if any) is confidential and may be
>> legally privileged.
>> >> >> It is intended solely for the addressee. Access to this message 
>> >> >> by anyone else is unauthorized. If you are not the intended 
>> >> >> recipient, any disclosure, copying, distribution or any action 
>> >> >> taken, or omitted to be taken, in reliance on it is prohibited 
>> >> >> and may be unlawful. If you have received this message in 
>> >> >> error, please delete all electronic copies of this message (and 
>> >> >> the documents attached to it, if any), destroy any hard copies 
>> >> >> you may have created and notify me immediately
>> >> by replying to this email. Thank you.
>> >> >>
>> >> >> Geisinger Health System utilizes an encryption process to 
>> >> >> safeguard Protected Health Information and other confidential 
>> >> >> data contained in external e-mail messages. If email is 
>> >> >> encrypted, the recipient will receive an e-mail instructing 
>> >> >> them to sign on to the Geisinger Health System Secure E-mail 
>> >> >> Message Center to
>> retrieve the encrypted e-mail.
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Eng Alaa Al-Barari
>> > phone 0599297470
>> >
>>
>>
>>
>> --
>> Eng Alaa Al-Barari
>> phone 0599297470
>>
>


--
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
Finan, from where to download the 2015. properties from sourceforg. those
all ICDs and snowmed ?

I prefer to learn how to generate my own db because I will need to create
my own later on, so your help is appreciated.

On Thu, Dec 10, 2015 at 9:13 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Thank, but what I endup with is
> wrong ?
> On Dec 10, 2015 4:26 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
> wrote:
>
>> Hi Alaa,
>>
>> If you downloaded the 2015 .property and .script files then you do not
>> need to run the dictionary creation tool.  Those databases are already
>> populated and ready to use.
>>
>> Sean
>>
>>
>> -----Original Message-----
>> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> Sent: Wednesday, December 09, 2015 6:33 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>>
>> so basically looks like the path had Desktop as capital thats why it did
>> not work.
>>
>> I ended up having rows like this inside ctakesicd2015.scripts :
>>
>> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg / 50
>> ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>> VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
>> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
>> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
>> injection','magnesium')
>>
>>
>> does this mean it worked ?
>>
>>
>>
>>
>>
>> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <al...@gmail.com>
>> wrote:
>>
>> > Thanks Finan and Brandon, your help is appreciated a lot.
>> >
>> > I downloaded the dictionary tool from
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
>> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIBaQ&
>> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYm
>> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM&s=
>> > JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
>> > I hope its the latest and bug free.
>> >
>> >
>> > my running command is : java -cp ./dictionarytool.jar:lib/*
>> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>> > /home/abarari/Desktop/umls/2015AB/META/ -atui
>> > ./data/optional/CtakesAnatTuis.txt -db
>> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd
>> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src
>> > ./data/small/ConversionSources.txt
>> > -tui ./data/optional/CtakesAllTuis.txt
>> >
>> >
>> >
>> > I am running on ubuntu by the way ... anyway under
>> > /home/abarari/Desktop/dictionarytool/output/
>> >
>> > there is only
>> >
>> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log
>> > ctakesicd2015.properties  ctakesicd2015.script
>> >
>> >
>> > where is the database ? am I doing something wrong ? do I need to
>> > create the database before executing the dictionarytool or what ?
>> >
>> >
>> > I found couple of issues in the dictionary tool, it does not work well
>> > with relative paths.
>> >
>> >
>> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>> >
>> >> Brandon,
>> >> That sounds great!
>> >> Please open a Jira ticket for any contributions (anyone should be
>> >> able to create a Jira account).  There are some legal items built
>> >> into the ASF Jira attachments for accepting contributions/donations.
>> >> It will also credit the contributors with the merit appropriately.
>> >> Anyone who is interested can follow the Jira item. (Even better if
>> >> contributions were open discussion/open development.) --Pei
>> >>
>> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>> >> <bd...@geisinger.edu> wrote:
>> >> > I'd be interested in contributing to making the dictionary tool
>> >> > more
>> >> user friendly with a GUI.
>> >> >
>> >> > Thanks,
>> >> > Brandon
>> >> >
>> >> > -----Original Message-----
>> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>> >> > Sent: Tuesday, December 08, 2015 6:12 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: RE: ctakes with icd10; 2015 versions available on
>> sourceforge!
>> >> >
>> >> > Hi Dave,
>> >> >
>> >> > I'm always happy to see interest in our stuff!
>> >> >
>> >> >>Step 1
>> >> > I built the tool to be able to build a dictionary using anything in
>> >> > the
>> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension
>> >> shouldn't be a problem.  You just add it to the CtakesSources file
>> >> (or create an alternate file and point to it with -src).  To answer
>> >> another of your questions, there can be zero or more sources - you
>> >> saw snomedct and snomedct_us (each valid in a different umls version).
>> >> > It also can include any semantic type, just add (or remove) the
>> >> appropriate tuis in a different data file.
>> >> >
>> >> >>Step 2
>> >> > You have it right - you copy the templates to another location and
>> >> output to that location.  Otherwise you 'lose' your templates.
>> >> >
>> >> >>Step 3 and 4
>> >> > The jar is built from source.  I need to (soon) check in updates to
>> >> > the
>> >> source, and at the same time I can check in a default prebuilt .jar
>> >> The lib/ directory is in the source repository.
>> >> >
>> >> > Various people have toyed with the idea of putting the tool into a
>> >> ctakes module, putting it into an "installation package", making a gui
>> ...
>> >> The best option (imo) is probably to make an easy to use gui and keep
>> >> a pre-built version in sandbox.  Someday, after the rainbow, maybe
>> >> I'll get a chance to do that ...
>> >> >
>> >> > Sean
>> >> >
>> >> >
>> >> > -----Original Message-----
>> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> > Sent: Tuesday, December 08, 2015 4:57 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: Re: ctakes with icd10; 2015 versions available on
>> sourceforge!
>> >> >
>> >> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>> >> database out of the box. Hopefully it will cut down on the need for
>> >> many to build their own DB's. Thank you much for doing that.
>> >> >
>> >> > Unfortunately, I still will need to build a custom one for us. I
>> >> > work
>> >> in veterinary medicine so I need to add in the veterinary extension
>> >> for SNOMED-CT into the database.
>> >> >
>> >> > I looked over the steps below that Brandon included and have some
>> >> questions:
>> >> >
>> >> > step 1 says to "Change /data/default/CtakesSources.txt from
>> "SNOMEDCT"
>> >> to "SNOMEDCT_US". The file that I have has two lines in it. First
>> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't
>> really make sense.
>> >> >
>> >> > step 2 should reference the two scripts as being in
>> >> resource/memdbtemplate so others don't have to search for them. Not
>> >> sure what it means to move them to "location to put new UMLS DB".
>> >> Does that mean move them into a new directory where the newly created
>> >> UMLS DB will get written?
>> >> >
>> >> > steps 3 and 4 for running the tools reference dictionarytool.jar
>> >> > which
>> >> doesn't exist. Does one need to build that somehow from the source
>> >> before running it? The command line also adds "lib/*" to the
>> >> classpath. Is that the lib directory inside the dictionarytool source
>> >> code or some other location?
>> >> >
>> >> > What else would I need to do to include the SNOMED-CT Veterinary
>> >> Extension along with the snomedct and rxnorm sources?
>> >> >
>> >> > I'll probably not have time to try this out for a while yet, but
>> >> > when I
>> >> do I'd be happy to write up an easy to follow tutorial for building a
>> >> custom dictionary assuming I am able to get it to work.
>> >> >
>> >> > Has anyone considered making this tool available outside of the
>> >> > source
>> >> code itself? Like including it in the main cTAKES release? It seems
>> >> there is demand for it.
>> >> >
>> >> > - Dave
>> >> >
>> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >> >
>> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
>> >> >>
>> >> >> I have checked in two new hsqldb dictionaries, both from the
>> >> >> 2015AB version of the UMLS.  They both have codes for snomedct_us,
>> >> >> rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui,
>> >> >> preferred term
>> >> mappings.
>> >> >>
>> >> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>> >> >> filtered by icd9 and icd10.
>> >> >> What this means:  Cuis that exist for a [filter source] are added
>> >> >> to the dictionary, as are all text variations from all sources
>> >> >> that contain that cui.  Both dictionaries also use the standard
>> >> >> ctakes semantic group tui filters.
>> >> >>
>> >> >> The names are ctakessnorx2015 and ctakesicd2015
>> >> >>
>> >> >> The snomed rxnorm :
>> >> >>
>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>> >> >> t_p_
>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>> >> >> Drwo
>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>> >> >> ry_l
>> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
>> >> >> 4JZM
>> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqws
>> >> >> l3Fm
>> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5g
>> >> >> ppCR
>> >> >> oS1Gav7r2A&e=
>> >> >>
>> >> >> The snomed rxnorm icd9 icd10:
>> >> >>
>> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>> >> >> t_p_
>> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>> >> >> Drwo
>> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>> >> >> ry_l
>> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>> >> >> ZMSd
>> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
>> >> >> FmuU
>> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G
>> >> >> 39Tw
>> >> >> w7EdYgKA&e=
>> >> >>
>> >> >> The svn root for the whole ugly thing is:
>> >> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>> >> >>
>> >> >> Stats:
>> >> >> ctakessnorx2015
>> >> >> 545,913 Terms
>> >> >> 229,251 Concepts (Cuis)
>> >> >> 272,987 Snomed codes
>> >> >> 32,419 Rxnorm codes
>> >> >> 11,321 icd9 codes
>> >> >> 61 icd10 codes
>> >> >>
>> >> >> Ctakesicd2015
>> >> >> 611,230 Terms
>> >> >> 282,211 Concepts
>> >> >> 18,626 icd9 codes
>> >> >> 45,818 icd10 codes
>> >> >> Snomed and Rxnorm counts are the same
>> >> >>
>> >> >> So, adding the icd filters gave us an extra ~53,000 concepts and
>> >> >> ~65,000 terms.
>> >> >>
>> >> >> I would like to move this all to a better root (not
>> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
>> >> >> write directly in trunk (??) and need to get moving on to other
>> things.
>> >> >>
>> >> >> There is help on the ctakes wiki:
>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>> >> >> org_
>> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2
>> >> >> BLoo
>> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
>> >> >> vlGZ
>> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVR
>> >> >> kL53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> >> >> Though I should probably add a few items ...
>> >> >>
>> >> >>
>> >> >> Sean
>> >> >>
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
>> >> >> To: dev@ctakes.apache.org
>> >> >> Subject: RE: ctakes with icd10
>> >> >>
>> >> >> Not to perpetuate the instructions again but I sent these out not
>> >> >> long ago when I was going through the process and Sean was helping
>> me.
>> >> >>
>> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
>> >> >> to "SNOMEDCT_US"
>> >> >>         2. Copy ctakesumls.properties and ctakesumls.script from
>> >> >> memdbtemplate to location to put new UMLS DB
>> >> >>         3. Run DictionaryCreator2
>> >> >>         java -cp dictionarytool.jar;lib/*
>> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >> >>         4. Run CodeMapCreator
>> >> >>         java -cp dictionarytool.jar;lib/*
>> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>> >> "\pathToUmls\META"
>> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >> >>         5. Copy new DB files to new location and create a copy of
>> >> >> cTakesHsql.xml and update dictionary location
>> >> >>
>> >> >> Thanks,
>> >> >> Brandon
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
>> >> >> To: dev@ctakes.apache.org
>> >> >> Subject: Re: ctakes with icd10
>> >> >>
>> >> >> This seems like a pretty common request and with such an old
>> >> >> version of UMLS database shipped with cTAKES it's only going to get
>> worse.
>> >> >> I've been wanting to build a dictionary using the latest UMLS
>> >> >> release (as well as a custom database), so would be happy to write
>> >> >> up the steps as I go through it. That assumes that I can dig up
>> >> >> the
>> >> instructions in the dev list.
>> >> >>
>> >> >> - Dave
>> >> >>
>> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>> >> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >> >>
>> >> >> > Hi Alaa,
>> >> >> >
>> >> >> > The -shortest- answer is that you'll need to run the dictionary
>> >> >> > creation tool.  There are instructions in older devlist threads.
>> >> >> > By default the dictionary creation tool does add icd9 and icd10
>> >> >> > tables to
>> >> >> the dictionary.
>> >> >> > The problem is that in Umls 2011AB those codes weren't very well
>> >> >> > populated.  The 2015AB icd# set is much more rich so those
>> >> >> > tables should be pretty good.  Then in ctakes you would look up
>> >> >> > annotations by icd9 or icd10 codes instead of by cui:
>> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>> >> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas,
>> >> >> > icd#Code );
>> >> >> >
>> >> >> > Sean
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: Savova, Guergana
>> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>> >> >> > To: dev@ctakes.apache.org
>> >> >> > Subject: RE: ctakes with icd10
>> >> >> >
>> >> >> > Hi Alaa,
>> >> >> > You need to create a resource off the terminology/ontology you
>> >> >> > want to use (in this case ICD9 or ICD10). Then run that resource
>> >> >> > with cTAKES for the fast dictionary lookup. There is cTAKES code
>> >> >> > and some documentation on how to create that resource. By
>> >> >> > default, cTAKES runs with a resource created from the English
>> >> >> > version of SNOMED CT
>> >> and RxNORM.
>> >> >> > Hope this helps.
>> >> >> > --Guergana
>> >> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>> >> >> > To: dev@ctakes.apache.org
>> >> >> > Subject: ctakes with icd10
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > I downloaded Latest umls version, and I want to know how to make
>> >> >> > ctakes work with icd10 and icd9.
>> >> >> >
>> >> >> >
>> >> >> > Thanks
>> >> >> >
>> >> >>
>> >> >>
>> >> >> IMPORTANT WARNING: The information in this message (and the
>> >> >> documents attached to it, if any) is confidential and may be
>> legally privileged.
>> >> >> It is intended solely for the addressee. Access to this message by
>> >> >> anyone else is unauthorized. If you are not the intended
>> >> >> recipient, any disclosure, copying, distribution or any action
>> >> >> taken, or omitted to be taken, in reliance on it is prohibited and
>> >> >> may be unlawful. If you have received this message in error,
>> >> >> please delete all electronic copies of this message (and the
>> >> >> documents attached to it, if any), destroy any hard copies you may
>> >> >> have created and notify me immediately
>> >> by replying to this email. Thank you.
>> >> >>
>> >> >> Geisinger Health System utilizes an encryption process to
>> >> >> safeguard Protected Health Information and other confidential data
>> >> >> contained in external e-mail messages. If email is encrypted, the
>> >> >> recipient will receive an e-mail instructing them to sign on to
>> >> >> the Geisinger Health System Secure E-mail Message Center to
>> retrieve the encrypted e-mail.
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Eng Alaa Al-Barari
>> > phone 0599297470
>> >
>>
>>
>>
>> --
>> Eng Alaa Al-Barari
>> phone 0599297470
>>
>


-- 
Eng Alaa Al-Barari
phone 0599297470

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
Thank, but what I endup with is
wrong ?
On Dec 10, 2015 4:26 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

> Hi Alaa,
>
> If you downloaded the 2015 .property and .script files then you do not
> need to run the dictionary creation tool.  Those databases are already
> populated and ready to use.
>
> Sean
>
>
> -----Original Message-----
> From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> Sent: Wednesday, December 09, 2015 6:33 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> so basically looks like the path had Desktop as capital thats why it did
> not work.
>
> I ended up having rows like this inside ctakesicd2015.scripts :
>
> INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg / 50
> ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
> VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
> - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS
> VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
> injection','magnesium')
>
>
> does this mean it worked ?
>
>
>
>
>
> On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <al...@gmail.com>
> wrote:
>
> > Thanks Finan and Brandon, your help is appreciated a lot.
> >
> > I downloaded the dictionary tool from
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
> > pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIBaQ&
> > c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYm
> > QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM&s=
> > JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
> > I hope its the latest and bug free.
> >
> >
> > my running command is : java -cp ./dictionarytool.jar:lib/*
> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> > /home/abarari/Desktop/umls/2015AB/META/ -atui
> > ./data/optional/CtakesAnatTuis.txt -db
> > jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd
> > 2015 -tbl CUI_TERMS -df ./data/optional/ -src
> > ./data/small/ConversionSources.txt
> > -tui ./data/optional/CtakesAllTuis.txt
> >
> >
> >
> > I am running on ubuntu by the way ... anyway under
> > /home/abarari/Desktop/dictionarytool/output/
> >
> > there is only
> >
> >  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log
> > ctakesicd2015.properties  ctakesicd2015.script
> >
> >
> > where is the database ? am I doing something wrong ? do I need to
> > create the database before executing the dictionarytool or what ?
> >
> >
> > I found couple of issues in the dictionary tool, it does not work well
> > with relative paths.
> >
> >
> > On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
> >
> >> Brandon,
> >> That sounds great!
> >> Please open a Jira ticket for any contributions (anyone should be
> >> able to create a Jira account).  There are some legal items built
> >> into the ASF Jira attachments for accepting contributions/donations.
> >> It will also credit the contributors with the merit appropriately.
> >> Anyone who is interested can follow the Jira item. (Even better if
> >> contributions were open discussion/open development.) --Pei
> >>
> >> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
> >> <bd...@geisinger.edu> wrote:
> >> > I'd be interested in contributing to making the dictionary tool
> >> > more
> >> user friendly with a GUI.
> >> >
> >> > Thanks,
> >> > Brandon
> >> >
> >> > -----Original Message-----
> >> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> >> > Sent: Tuesday, December 08, 2015 6:12 PM
> >> > To: dev@ctakes.apache.org
> >> > Subject: RE: ctakes with icd10; 2015 versions available on
> sourceforge!
> >> >
> >> > Hi Dave,
> >> >
> >> > I'm always happy to see interest in our stuff!
> >> >
> >> >>Step 1
> >> > I built the tool to be able to build a dictionary using anything in
> >> > the
> >> umls - snomed, icd9, hpo, etc. so using the veterinary extension
> >> shouldn't be a problem.  You just add it to the CtakesSources file
> >> (or create an alternate file and point to it with -src).  To answer
> >> another of your questions, there can be zero or more sources - you
> >> saw snomedct and snomedct_us (each valid in a different umls version).
> >> > It also can include any semantic type, just add (or remove) the
> >> appropriate tuis in a different data file.
> >> >
> >> >>Step 2
> >> > You have it right - you copy the templates to another location and
> >> output to that location.  Otherwise you 'lose' your templates.
> >> >
> >> >>Step 3 and 4
> >> > The jar is built from source.  I need to (soon) check in updates to
> >> > the
> >> source, and at the same time I can check in a default prebuilt .jar
> >> The lib/ directory is in the source repository.
> >> >
> >> > Various people have toyed with the idea of putting the tool into a
> >> ctakes module, putting it into an "installation package", making a gui
> ...
> >> The best option (imo) is probably to make an easy to use gui and keep
> >> a pre-built version in sandbox.  Someday, after the rainbow, maybe
> >> I'll get a chance to do that ...
> >> >
> >> > Sean
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >> > Sent: Tuesday, December 08, 2015 4:57 PM
> >> > To: dev@ctakes.apache.org
> >> > Subject: Re: ctakes with icd10; 2015 versions available on
> sourceforge!
> >> >
> >> > Thanks, Sean! It's great that cTAKES may soon have an up to date
> >> database out of the box. Hopefully it will cut down on the need for
> >> many to build their own DB's. Thank you much for doing that.
> >> >
> >> > Unfortunately, I still will need to build a custom one for us. I
> >> > work
> >> in veterinary medicine so I need to add in the veterinary extension
> >> for SNOMED-CT into the database.
> >> >
> >> > I looked over the steps below that Brandon included and have some
> >> questions:
> >> >
> >> > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT"
> >> to "SNOMEDCT_US". The file that I have has two lines in it. First
> >> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really
> make sense.
> >> >
> >> > step 2 should reference the two scripts as being in
> >> resource/memdbtemplate so others don't have to search for them. Not
> >> sure what it means to move them to "location to put new UMLS DB".
> >> Does that mean move them into a new directory where the newly created
> >> UMLS DB will get written?
> >> >
> >> > steps 3 and 4 for running the tools reference dictionarytool.jar
> >> > which
> >> doesn't exist. Does one need to build that somehow from the source
> >> before running it? The command line also adds "lib/*" to the
> >> classpath. Is that the lib directory inside the dictionarytool source
> >> code or some other location?
> >> >
> >> > What else would I need to do to include the SNOMED-CT Veterinary
> >> Extension along with the snomedct and rxnorm sources?
> >> >
> >> > I'll probably not have time to try this out for a while yet, but
> >> > when I
> >> do I'd be happy to write up an easy to follow tutorial for building a
> >> custom dictionary assuming I am able to get it to work.
> >> >
> >> > Has anyone considered making this tool available outside of the
> >> > source
> >> code itself? Like including it in the main cTAKES release? It seems
> >> there is demand for it.
> >> >
> >> > - Dave
> >> >
> >> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> >> Sean.Finan@childrens.harvard.edu> wrote:
> >> >
> >> >> Hi Brandon, thanks for finding and forwarding the instructions!
> >> >>
> >> >> I have checked in two new hsqldb dictionaries, both from the
> >> >> 2015AB version of the UMLS.  They both have codes for snomedct_us,
> >> >> rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui,
> >> >> preferred term
> >> mappings.
> >> >>
> >> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
> >> >> filtered by icd9 and icd10.
> >> >> What this means:  Cuis that exist for a [filter source] are added
> >> >> to the dictionary, as are all text variations from all sources
> >> >> that contain that cui.  Both dictionaries also use the standard
> >> >> ctakes semantic group tui filters.
> >> >>
> >> >> The names are ctakessnorx2015 and ctakesicd2015
> >> >>
> >> >> The snomed rxnorm :
> >> >>
> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
> >> >> t_p_
> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
> >> >> Drwo
> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
> >> >> ry_l
> >> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
> >> >> 4JZM
> >> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqws
> >> >> l3Fm
> >> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5g
> >> >> ppCR
> >> >> oS1Gav7r2A&e=
> >> >>
> >> >> The snomed rxnorm icd9 icd10:
> >> >>
> >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
> >> >> t_p_
> >> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
> >> >> Drwo
> >> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
> >> >> ry_l
> >> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> >> >> ZMSd
> >> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
> >> >> FmuU
> >> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G
> >> >> 39Tw
> >> >> w7EdYgKA&e=
> >> >>
> >> >> The svn root for the whole ugly thing is:
> >> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >> >>
> >> >> Stats:
> >> >> ctakessnorx2015
> >> >> 545,913 Terms
> >> >> 229,251 Concepts (Cuis)
> >> >> 272,987 Snomed codes
> >> >> 32,419 Rxnorm codes
> >> >> 11,321 icd9 codes
> >> >> 61 icd10 codes
> >> >>
> >> >> Ctakesicd2015
> >> >> 611,230 Terms
> >> >> 282,211 Concepts
> >> >> 18,626 icd9 codes
> >> >> 45,818 icd10 codes
> >> >> Snomed and Rxnorm counts are the same
> >> >>
> >> >> So, adding the icd filters gave us an extra ~53,000 concepts and
> >> >> ~65,000 terms.
> >> >>
> >> >> I would like to move this all to a better root (not
> >> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
> >> >> write directly in trunk (??) and need to get moving on to other
> things.
> >> >>
> >> >> There is help on the ctakes wiki:
> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
> >> >> org_
> >> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2
> >> >> BLoo
> >> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
> >> >> vlGZ
> >> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVR
> >> >> kL53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> >> >> Though I should probably add a few items ...
> >> >>
> >> >>
> >> >> Sean
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> >> >> Sent: Tuesday, December 08, 2015 12:51 PM
> >> >> To: dev@ctakes.apache.org
> >> >> Subject: RE: ctakes with icd10
> >> >>
> >> >> Not to perpetuate the instructions again but I sent these out not
> >> >> long ago when I was going through the process and Sean was helping
> me.
> >> >>
> >> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT"
> >> >> to "SNOMEDCT_US"
> >> >>         2. Copy ctakesumls.properties and ctakesumls.script from
> >> >> memdbtemplate to location to put new UMLS DB
> >> >>         3. Run DictionaryCreator2
> >> >>         java -cp dictionarytool.jar;lib/*
> >> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> >> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >> >>         4. Run CodeMapCreator
> >> >>         java -cp dictionarytool.jar;lib/*
> >> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
> >> "\pathToUmls\META"
> >> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >> >>         5. Copy new DB files to new location and create a copy of
> >> >> cTakesHsql.xml and update dictionary location
> >> >>
> >> >> Thanks,
> >> >> Brandon
> >> >>
> >> >> -----Original Message-----
> >> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >> >> Sent: Tuesday, December 08, 2015 12:47 PM
> >> >> To: dev@ctakes.apache.org
> >> >> Subject: Re: ctakes with icd10
> >> >>
> >> >> This seems like a pretty common request and with such an old
> >> >> version of UMLS database shipped with cTAKES it's only going to get
> worse.
> >> >> I've been wanting to build a dictionary using the latest UMLS
> >> >> release (as well as a custom database), so would be happy to write
> >> >> up the steps as I go through it. That assumes that I can dig up
> >> >> the
> >> instructions in the dev list.
> >> >>
> >> >> - Dave
> >> >>
> >> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> >> >> Sean.Finan@childrens.harvard.edu> wrote:
> >> >>
> >> >> > Hi Alaa,
> >> >> >
> >> >> > The -shortest- answer is that you'll need to run the dictionary
> >> >> > creation tool.  There are instructions in older devlist threads.
> >> >> > By default the dictionary creation tool does add icd9 and icd10
> >> >> > tables to
> >> >> the dictionary.
> >> >> > The problem is that in Umls 2011AB those codes weren't very well
> >> >> > populated.  The 2015AB icd# set is much more rich so those
> >> >> > tables should be pretty good.  Then in ctakes you would look up
> >> >> > annotations by icd9 or icd10 codes instead of by cui:
> >> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
> >> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas,
> >> >> > icd#Code );
> >> >> >
> >> >> > Sean
> >> >> >
> >> >> > -----Original Message-----
> >> >> > From: Savova, Guergana
> >> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
> >> >> > Sent: Tuesday, December 08, 2015 12:17 PM
> >> >> > To: dev@ctakes.apache.org
> >> >> > Subject: RE: ctakes with icd10
> >> >> >
> >> >> > Hi Alaa,
> >> >> > You need to create a resource off the terminology/ontology you
> >> >> > want to use (in this case ICD9 or ICD10). Then run that resource
> >> >> > with cTAKES for the fast dictionary lookup. There is cTAKES code
> >> >> > and some documentation on how to create that resource. By
> >> >> > default, cTAKES runs with a resource created from the English
> >> >> > version of SNOMED CT
> >> and RxNORM.
> >> >> > Hope this helps.
> >> >> > --Guergana
> >> >> >
> >> >> > -----Original Message-----
> >> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >> >> > Sent: Tuesday, December 8, 2015 10:01 AM
> >> >> > To: dev@ctakes.apache.org
> >> >> > Subject: ctakes with icd10
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > I downloaded Latest umls version, and I want to know how to make
> >> >> > ctakes work with icd10 and icd9.
> >> >> >
> >> >> >
> >> >> > Thanks
> >> >> >
> >> >>
> >> >>
> >> >> IMPORTANT WARNING: The information in this message (and the
> >> >> documents attached to it, if any) is confidential and may be legally
> privileged.
> >> >> It is intended solely for the addressee. Access to this message by
> >> >> anyone else is unauthorized. If you are not the intended
> >> >> recipient, any disclosure, copying, distribution or any action
> >> >> taken, or omitted to be taken, in reliance on it is prohibited and
> >> >> may be unlawful. If you have received this message in error,
> >> >> please delete all electronic copies of this message (and the
> >> >> documents attached to it, if any), destroy any hard copies you may
> >> >> have created and notify me immediately
> >> by replying to this email. Thank you.
> >> >>
> >> >> Geisinger Health System utilizes an encryption process to
> >> >> safeguard Protected Health Information and other confidential data
> >> >> contained in external e-mail messages. If email is encrypted, the
> >> >> recipient will receive an e-mail instructing them to sign on to
> >> >> the Geisinger Health System Secure E-mail Message Center to retrieve
> the encrypted e-mail.
> >> >>
> >>
> >
> >
> >
> > --
> > Eng Alaa Al-Barari
> > phone 0599297470
> >
>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Alaa,

If you downloaded the 2015 .property and .script files then you do not need to run the dictionary creation tool.  Those databases are already populated and ready to use.

Sean


-----Original Message-----
From: Alaa al Barari [mailto:alaa.albarari@gmail.com] 
Sent: Wednesday, December 09, 2015 6:33 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

so basically looks like the path had Desktop as capital thats why it did not work.

I ended up having rows like this inside ctakesicd2015.scripts :

INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg / 50 ml - nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
- nacl 0 . 9 % intravenous solution','nacl') INSERT INTO CUI_TERMS VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
injection','magnesium')


does this mean it worked ?





On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Thanks Finan and Brandon, your help is appreciated a lot.
>
> I downloaded the dictionary tool from
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_re
> pos_asf_ctakes_sandbox_dictionarytool_bin_dictionarytool.zip&d=BQIBaQ&
> c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYm
> QCP6r0bcpKGd4f7d4gTao&m=uJq_3OpLiUaBOz9vqxKBI-gUAtLhJMme9uKXqroHhMM&s=
> JVOlLM08gTn5rV2T3R_bqeZT8XbMDgLhfKg8Fo5mAQw&e=
> I hope its the latest and bug free.
>
>
> my running command is : java -cp ./dictionarytool.jar:lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> /home/abarari/Desktop/umls/2015AB/META/ -atui 
> ./data/optional/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd
> 2015 -tbl CUI_TERMS -df ./data/optional/ -src 
> ./data/small/ConversionSources.txt
> -tui ./data/optional/CtakesAllTuis.txt
>
>
>
> I am running on ubuntu by the way ... anyway under 
> /home/abarari/Desktop/dictionarytool/output/
>
> there is only
>
>  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls ctakesicd2015.log  
> ctakesicd2015.properties  ctakesicd2015.script
>
>
> where is the database ? am I doing something wrong ? do I need to 
> create the database before executing the dictionarytool or what ?
>
>
> I found couple of issues in the dictionary tool, it does not work well 
> with relative paths.
>
>
> On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>
>> Brandon,
>> That sounds great!
>> Please open a Jira ticket for any contributions (anyone should be 
>> able to create a Jira account).  There are some legal items built 
>> into the ASF Jira attachments for accepting contributions/donations.
>> It will also credit the contributors with the merit appropriately.
>> Anyone who is interested can follow the Jira item. (Even better if 
>> contributions were open discussion/open development.) --Pei
>>
>> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>> <bd...@geisinger.edu> wrote:
>> > I'd be interested in contributing to making the dictionary tool 
>> > more
>> user friendly with a GUI.
>> >
>> > Thanks,
>> > Brandon
>> >
>> > -----Original Message-----
>> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>> > Sent: Tuesday, December 08, 2015 6:12 PM
>> > To: dev@ctakes.apache.org
>> > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
>> >
>> > Hi Dave,
>> >
>> > I'm always happy to see interest in our stuff!
>> >
>> >>Step 1
>> > I built the tool to be able to build a dictionary using anything in 
>> > the
>> umls - snomed, icd9, hpo, etc. so using the veterinary extension 
>> shouldn't be a problem.  You just add it to the CtakesSources file 
>> (or create an alternate file and point to it with -src).  To answer 
>> another of your questions, there can be zero or more sources - you 
>> saw snomedct and snomedct_us (each valid in a different umls version).
>> > It also can include any semantic type, just add (or remove) the
>> appropriate tuis in a different data file.
>> >
>> >>Step 2
>> > You have it right - you copy the templates to another location and
>> output to that location.  Otherwise you 'lose' your templates.
>> >
>> >>Step 3 and 4
>> > The jar is built from source.  I need to (soon) check in updates to 
>> > the
>> source, and at the same time I can check in a default prebuilt .jar  
>> The lib/ directory is in the source repository.
>> >
>> > Various people have toyed with the idea of putting the tool into a
>> ctakes module, putting it into an "installation package", making a gui ...
>> The best option (imo) is probably to make an easy to use gui and keep 
>> a pre-built version in sandbox.  Someday, after the rainbow, maybe 
>> I'll get a chance to do that ...
>> >
>> > Sean
>> >
>> >
>> > -----Original Message-----
>> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> > Sent: Tuesday, December 08, 2015 4:57 PM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>> >
>> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>> database out of the box. Hopefully it will cut down on the need for 
>> many to build their own DB's. Thank you much for doing that.
>> >
>> > Unfortunately, I still will need to build a custom one for us. I 
>> > work
>> in veterinary medicine so I need to add in the veterinary extension 
>> for SNOMED-CT into the database.
>> >
>> > I looked over the steps below that Brandon included and have some
>> questions:
>> >
>> > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT"
>> to "SNOMEDCT_US". The file that I have has two lines in it. First 
>> line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>> >
>> > step 2 should reference the two scripts as being in
>> resource/memdbtemplate so others don't have to search for them. Not 
>> sure what it means to move them to "location to put new UMLS DB". 
>> Does that mean move them into a new directory where the newly created 
>> UMLS DB will get written?
>> >
>> > steps 3 and 4 for running the tools reference dictionarytool.jar 
>> > which
>> doesn't exist. Does one need to build that somehow from the source 
>> before running it? The command line also adds "lib/*" to the 
>> classpath. Is that the lib directory inside the dictionarytool source 
>> code or some other location?
>> >
>> > What else would I need to do to include the SNOMED-CT Veterinary
>> Extension along with the snomedct and rxnorm sources?
>> >
>> > I'll probably not have time to try this out for a while yet, but 
>> > when I
>> do I'd be happy to write up an easy to follow tutorial for building a 
>> custom dictionary assuming I am able to get it to work.
>> >
>> > Has anyone considered making this tool available outside of the 
>> > source
>> code itself? Like including it in the main cTAKES release? It seems 
>> there is demand for it.
>> >
>> > - Dave
>> >
>> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>> >
>> >> Hi Brandon, thanks for finding and forwarding the instructions!
>> >>
>> >> I have checked in two new hsqldb dictionaries, both from the 
>> >> 2015AB version of the UMLS.  They both have codes for snomedct_us, 
>> >> rxnorm, icd9cm and icd10pcs - as well as the usual cui, tui, 
>> >> preferred term
>> mappings.
>> >>
>> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
>> >> filtered by icd9 and icd10.
>> >> What this means:  Cuis that exist for a [filter source] are added 
>> >> to the dictionary, as are all text variations from all sources 
>> >> that contain that cui.  Both dictionaries also use the standard 
>> >> ctakes semantic group tui filters.
>> >>
>> >> The names are ctakessnorx2015 and ctakesicd2015
>> >>
>> >> The snomed rxnorm :
>> >>
>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>> >> t_p_ 
>> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>> >> Drwo 
>> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>> >> ry_l 
>> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
>> >> 4JZM 
>> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqws
>> >> l3Fm 
>> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5g
>> >> ppCR
>> >> oS1Gav7r2A&e=
>> >>
>> >> The snomed rxnorm icd9 icd10:
>> >>
>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.ne
>> >> t_p_ 
>> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2
>> >> Drwo 
>> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictiona
>> >> ry_l 
>> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
>> >> ZMSd 
>> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
>> >> FmuU 
>> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G
>> >> 39Tw
>> >> w7EdYgKA&e=
>> >>
>> >> The svn root for the whole ugly thing is:
>> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>> >>
>> >> Stats:
>> >> ctakessnorx2015
>> >> 545,913 Terms
>> >> 229,251 Concepts (Cuis)
>> >> 272,987 Snomed codes
>> >> 32,419 Rxnorm codes
>> >> 11,321 icd9 codes
>> >> 61 icd10 codes
>> >>
>> >> Ctakesicd2015
>> >> 611,230 Terms
>> >> 282,211 Concepts
>> >> 18,626 icd9 codes
>> >> 45,818 icd10 codes
>> >> Snomed and Rxnorm counts are the same
>> >>
>> >> So, adding the icd filters gave us an extra ~53,000 concepts and
>> >> ~65,000 terms.
>> >>
>> >> I would like to move this all to a better root (not
>> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to 
>> >> write directly in trunk (??) and need to get moving on to other things.
>> >>
>> >> There is help on the ctakes wiki:
>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.
>> >> org_ 
>> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2
>> >> BLoo 
>> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
>> >> vlGZ
>> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVR
>> >> kL53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> >> Though I should probably add a few items ...
>> >>
>> >>
>> >> Sean
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> >> Sent: Tuesday, December 08, 2015 12:51 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: RE: ctakes with icd10
>> >>
>> >> Not to perpetuate the instructions again but I sent these out not 
>> >> long ago when I was going through the process and Sean was helping me.
>> >>
>> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" 
>> >> to "SNOMEDCT_US"
>> >>         2. Copy ctakesumls.properties and ctakesumls.script from 
>> >> memdbtemplate to location to put new UMLS DB
>> >>         3. Run DictionaryCreator2
>> >>         java -cp dictionarytool.jar;lib/*
>> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
>> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >>         4. Run CodeMapCreator
>> >>         java -cp dictionarytool.jar;lib/* 
>> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>> "\pathToUmls\META"
>> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >>         5. Copy new DB files to new location and create a copy of 
>> >> cTakesHsql.xml and update dictionary location
>> >>
>> >> Thanks,
>> >> Brandon
>> >>
>> >> -----Original Message-----
>> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> Sent: Tuesday, December 08, 2015 12:47 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: Re: ctakes with icd10
>> >>
>> >> This seems like a pretty common request and with such an old 
>> >> version of UMLS database shipped with cTAKES it's only going to get worse.
>> >> I've been wanting to build a dictionary using the latest UMLS 
>> >> release (as well as a custom database), so would be happy to write 
>> >> up the steps as I go through it. That assumes that I can dig up 
>> >> the
>> instructions in the dev list.
>> >>
>> >> - Dave
>> >>
>> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
>> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >>
>> >> > Hi Alaa,
>> >> >
>> >> > The -shortest- answer is that you'll need to run the dictionary 
>> >> > creation tool.  There are instructions in older devlist threads.  
>> >> > By default the dictionary creation tool does add icd9 and icd10 
>> >> > tables to
>> >> the dictionary.
>> >> > The problem is that in Umls 2011AB those codes weren't very well 
>> >> > populated.  The 2015AB icd# set is much more rich so those 
>> >> > tables should be pretty good.  Then in ctakes you would look up 
>> >> > annotations by icd9 or icd10 codes instead of by cui:
>> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
>> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, 
>> >> > icd#Code );
>> >> >
>> >> > Sean
>> >> >
>> >> > -----Original Message-----
>> >> > From: Savova, Guergana
>> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: RE: ctakes with icd10
>> >> >
>> >> > Hi Alaa,
>> >> > You need to create a resource off the terminology/ontology you 
>> >> > want to use (in this case ICD9 or ICD10). Then run that resource 
>> >> > with cTAKES for the fast dictionary lookup. There is cTAKES code 
>> >> > and some documentation on how to create that resource. By 
>> >> > default, cTAKES runs with a resource created from the English 
>> >> > version of SNOMED CT
>> and RxNORM.
>> >> > Hope this helps.
>> >> > --Guergana
>> >> >
>> >> > -----Original Message-----
>> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: ctakes with icd10
>> >> >
>> >> > Hi,
>> >> >
>> >> > I downloaded Latest umls version, and I want to know how to make 
>> >> > ctakes work with icd10 and icd9.
>> >> >
>> >> >
>> >> > Thanks
>> >> >
>> >>
>> >>
>> >> IMPORTANT WARNING: The information in this message (and the 
>> >> documents attached to it, if any) is confidential and may be legally privileged.
>> >> It is intended solely for the addressee. Access to this message by 
>> >> anyone else is unauthorized. If you are not the intended 
>> >> recipient, any disclosure, copying, distribution or any action 
>> >> taken, or omitted to be taken, in reliance on it is prohibited and 
>> >> may be unlawful. If you have received this message in error, 
>> >> please delete all electronic copies of this message (and the 
>> >> documents attached to it, if any), destroy any hard copies you may 
>> >> have created and notify me immediately
>> by replying to this email. Thank you.
>> >>
>> >> Geisinger Health System utilizes an encryption process to 
>> >> safeguard Protected Health Information and other confidential data 
>> >> contained in external e-mail messages. If email is encrypted, the 
>> >> recipient will receive an e-mail instructing them to sign on to 
>> >> the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
>> >>
>>
>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



--
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
so basically looks like the path had Desktop as capital thats why it did
not work.

I ended up having rows like this inside ctakesicd2015.scripts :

INSERT INTO CUI_TERMS VALUES(2723481,8,15,'magnesium sulfate 1000 mg / 50
ml - nacl 0 . 9 % intravenous solution','nacl')
INSERT INTO CUI_TERMS VALUES(2723481,9,16,'magnesium sulfate , 2 g / 100 ml
- nacl 0 . 9 % intravenous solution','nacl')
INSERT INTO CUI_TERMS VALUES(2723481,0,7,'magnesium sulfate 20 mg / ml
injection','magnesium')


does this mean it worked ?





On Thu, Dec 10, 2015 at 1:07 AM, Alaa al Barari <al...@gmail.com>
wrote:

> Thanks Finan and Brandon, your help is appreciated a lot.
>
> I downloaded the dictionary tool from
> https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/bin/dictionarytool.zip
> I hope its the latest and bug free.
>
>
> my running command is : java -cp ./dictionarytool.jar:lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> /home/abarari/Desktop/umls/2015AB/META/ -atui
> ./data/optional/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd2015
> -tbl CUI_TERMS -df ./data/optional/ -src ./data/small/ConversionSources.txt
> -tui ./data/optional/CtakesAllTuis.txt
>
>
>
> I am running on ubuntu by the way ... anyway under
> /home/abarari/Desktop/dictionarytool/output/
>
> there is only
>
>  abarari@ubuntu:~/Desktop/dictionarytool/output$ ls
> ctakesicd2015.log  ctakesicd2015.properties  ctakesicd2015.script
>
>
> where is the database ? am I doing something wrong ? do I need to create
> the database before executing the dictionarytool or what ?
>
>
> I found couple of issues in the dictionary tool, it does not work well
> with relative paths.
>
>
> On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:
>
>> Brandon,
>> That sounds great!
>> Please open a Jira ticket for any contributions (anyone should be able
>> to create a Jira account).  There are some legal items built into the
>> ASF Jira attachments for accepting contributions/donations.
>> It will also credit the contributors with the merit appropriately.
>> Anyone who is interested can follow the Jira item. (Even better if
>> contributions were open discussion/open development.)
>> --Pei
>>
>> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
>> <bd...@geisinger.edu> wrote:
>> > I'd be interested in contributing to making the dictionary tool more
>> user friendly with a GUI.
>> >
>> > Thanks,
>> > Brandon
>> >
>> > -----Original Message-----
>> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
>> > Sent: Tuesday, December 08, 2015 6:12 PM
>> > To: dev@ctakes.apache.org
>> > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
>> >
>> > Hi Dave,
>> >
>> > I'm always happy to see interest in our stuff!
>> >
>> >>Step 1
>> > I built the tool to be able to build a dictionary using anything in the
>> umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't
>> be a problem.  You just add it to the CtakesSources file (or create an
>> alternate file and point to it with -src).  To answer another of your
>> questions, there can be zero or more sources - you saw snomedct and
>> snomedct_us (each valid in a different umls version).
>> > It also can include any semantic type, just add (or remove) the
>> appropriate tuis in a different data file.
>> >
>> >>Step 2
>> > You have it right - you copy the templates to another location and
>> output to that location.  Otherwise you 'lose' your templates.
>> >
>> >>Step 3 and 4
>> > The jar is built from source.  I need to (soon) check in updates to the
>> source, and at the same time I can check in a default prebuilt .jar  The
>> lib/ directory is in the source repository.
>> >
>> > Various people have toyed with the idea of putting the tool into a
>> ctakes module, putting it into an "installation package", making a gui ...
>> The best option (imo) is probably to make an easy to use gui and keep a
>> pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a
>> chance to do that ...
>> >
>> > Sean
>> >
>> >
>> > -----Original Message-----
>> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> > Sent: Tuesday, December 08, 2015 4:57 PM
>> > To: dev@ctakes.apache.org
>> > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>> >
>> > Thanks, Sean! It's great that cTAKES may soon have an up to date
>> database out of the box. Hopefully it will cut down on the need for many to
>> build their own DB's. Thank you much for doing that.
>> >
>> > Unfortunately, I still will need to build a custom one for us. I work
>> in veterinary medicine so I need to add in the veterinary extension for
>> SNOMED-CT into the database.
>> >
>> > I looked over the steps below that Brandon included and have some
>> questions:
>> >
>> > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT"
>> to "SNOMEDCT_US". The file that I have has two lines in it. First line is
>> SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>> >
>> > step 2 should reference the two scripts as being in
>> resource/memdbtemplate so others don't have to search for them. Not sure
>> what it means to move them to "location to put new UMLS DB". Does that mean
>> move them into a new directory where the newly created UMLS DB will get
>> written?
>> >
>> > steps 3 and 4 for running the tools reference dictionarytool.jar which
>> doesn't exist. Does one need to build that somehow from the source before
>> running it? The command line also adds "lib/*" to the classpath. Is that
>> the lib directory inside the dictionarytool source code or some other
>> location?
>> >
>> > What else would I need to do to include the SNOMED-CT Veterinary
>> Extension along with the snomedct and rxnorm sources?
>> >
>> > I'll probably not have time to try this out for a while yet, but when I
>> do I'd be happy to write up an easy to follow tutorial for building a
>> custom dictionary assuming I am able to get it to work.
>> >
>> > Has anyone considered making this tool available outside of the source
>> code itself? Like including it in the main cTAKES release? It seems there
>> is demand for it.
>> >
>> > - Dave
>> >
>> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>> >
>> >> Hi Brandon, thanks for finding and forwarding the instructions!
>> >>
>> >> I have checked in two new hsqldb dictionaries, both from the 2015AB
>> >> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
>> >> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term
>> mappings.
>> >>
>> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>> >> filtered by icd9 and icd10.
>> >> What this means:  Cuis that exist for a [filter source] are added to
>> >> the dictionary, as are all text variations from all sources that
>> >> contain that cui.  Both dictionaries also use the standard ctakes
>> >> semantic group tui filters.
>> >>
>> >> The names are ctakessnorx2015 and ctakesicd2015
>> >>
>> >> The snomed rxnorm :
>> >>
>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
>> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
>> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
>> >> oS1Gav7r2A&e=
>> >>
>> >> The snomed rxnorm icd9 icd10:
>> >>
>> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
>> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
>> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
>> >> w7EdYgKA&e=
>> >>
>> >> The svn root for the whole ugly thing is:
>> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>> >>
>> >> Stats:
>> >> ctakessnorx2015
>> >> 545,913 Terms
>> >> 229,251 Concepts (Cuis)
>> >> 272,987 Snomed codes
>> >> 32,419 Rxnorm codes
>> >> 11,321 icd9 codes
>> >> 61 icd10 codes
>> >>
>> >> Ctakesicd2015
>> >> 611,230 Terms
>> >> 282,211 Concepts
>> >> 18,626 icd9 codes
>> >> 45,818 icd10 codes
>> >> Snomed and Rxnorm counts are the same
>> >>
>> >> So, adding the icd filters gave us an extra ~53,000 concepts and
>> >> ~65,000 terms.
>> >>
>> >> I would like to move this all to a better root (not
>> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
>> >> write directly in trunk (??) and need to get moving on to other things.
>> >>
>> >> There is help on the ctakes wiki:
>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
>> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
>> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
>> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
>> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> >> Though I should probably add a few items ...
>> >>
>> >>
>> >> Sean
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> >> Sent: Tuesday, December 08, 2015 12:51 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: RE: ctakes with icd10
>> >>
>> >> Not to perpetuate the instructions again but I sent these out not long
>> >> ago when I was going through the process and Sean was helping me.
>> >>
>> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
>> >> "SNOMEDCT_US"
>> >>         2. Copy ctakesumls.properties and ctakesumls.script from
>> >> memdbtemplate to location to put new UMLS DB
>> >>         3. Run DictionaryCreator2
>> >>         java -cp dictionarytool.jar;lib/*
>> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >>         4. Run CodeMapCreator
>> >>         java -cp dictionarytool.jar;lib/*
>> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls
>> "\pathToUmls\META"
>> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>> >>         5. Copy new DB files to new location and create a copy of
>> >> cTakesHsql.xml and update dictionary location
>> >>
>> >> Thanks,
>> >> Brandon
>> >>
>> >> -----Original Message-----
>> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> >> Sent: Tuesday, December 08, 2015 12:47 PM
>> >> To: dev@ctakes.apache.org
>> >> Subject: Re: ctakes with icd10
>> >>
>> >> This seems like a pretty common request and with such an old version
>> >> of UMLS database shipped with cTAKES it's only going to get worse.
>> >> I've been wanting to build a dictionary using the latest UMLS release
>> >> (as well as a custom database), so would be happy to write up the
>> >> steps as I go through it. That assumes that I can dig up the
>> instructions in the dev list.
>> >>
>> >> - Dave
>> >>
>> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>> >> Sean.Finan@childrens.harvard.edu> wrote:
>> >>
>> >> > Hi Alaa,
>> >> >
>> >> > The -shortest- answer is that you'll need to run the dictionary
>> >> > creation tool.  There are instructions in older devlist threads.  By
>> >> > default the dictionary creation tool does add icd9 and icd10 tables
>> >> > to
>> >> the dictionary.
>> >> > The problem is that in Umls 2011AB those codes weren't very well
>> >> > populated.  The 2015AB icd# set is much more rich so those tables
>> >> > should be pretty good.  Then in ctakes you would look up annotations
>> >> > by icd9 or icd10 codes instead of by cui:
>> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
>> >> > );
>> >> >
>> >> > Sean
>> >> >
>> >> > -----Original Message-----
>> >> > From: Savova, Guergana
>> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> >> > Sent: Tuesday, December 08, 2015 12:17 PM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: RE: ctakes with icd10
>> >> >
>> >> > Hi Alaa,
>> >> > You need to create a resource off the terminology/ontology you want
>> >> > to use (in this case ICD9 or ICD10). Then run that resource with
>> >> > cTAKES for the fast dictionary lookup. There is cTAKES code and some
>> >> > documentation on how to create that resource. By default, cTAKES
>> >> > runs with a resource created from the English version of SNOMED CT
>> and RxNORM.
>> >> > Hope this helps.
>> >> > --Guergana
>> >> >
>> >> > -----Original Message-----
>> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> >> > Sent: Tuesday, December 8, 2015 10:01 AM
>> >> > To: dev@ctakes.apache.org
>> >> > Subject: ctakes with icd10
>> >> >
>> >> > Hi,
>> >> >
>> >> > I downloaded Latest umls version, and I want to know how to make
>> >> > ctakes work with icd10 and icd9.
>> >> >
>> >> >
>> >> > Thanks
>> >> >
>> >>
>> >>
>> >> IMPORTANT WARNING: The information in this message (and the documents
>> >> attached to it, if any) is confidential and may be legally privileged.
>> >> It is intended solely for the addressee. Access to this message by
>> >> anyone else is unauthorized. If you are not the intended recipient,
>> >> any disclosure, copying, distribution or any action taken, or omitted
>> >> to be taken, in reliance on it is prohibited and may be unlawful. If
>> >> you have received this message in error, please delete all electronic
>> >> copies of this message (and the documents attached to it, if any),
>> >> destroy any hard copies you may have created and notify me immediately
>> by replying to this email. Thank you.
>> >>
>> >> Geisinger Health System utilizes an encryption process to safeguard
>> >> Protected Health Information and other confidential data contained in
>> >> external e-mail messages. If email is encrypted, the recipient will
>> >> receive an e-mail instructing them to sign on to the Geisinger Health
>> >> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>> >>
>>
>
>
>
> --
> Eng Alaa Al-Barari
> phone 0599297470
>



-- 
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
Thanks Finan and Brandon, your help is appreciated a lot.

I downloaded the dictionary tool from
https://svn.apache.org/repos/asf/ctakes/sandbox/dictionarytool/bin/dictionarytool.zip
I hope its the latest and bug free.


my running command is : java -cp ./dictionarytool.jar:lib/*
org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
/home/abarari/Desktop/umls/2015AB/META/ -atui
./data/optional/CtakesAnatTuis.txt -db
jdbc:hsqldb:file:/home/abarari/Desktop/dictionarytool/output/ctakesicd2015
-tbl CUI_TERMS -df ./data/optional/ -src ./data/small/ConversionSources.txt
-tui ./data/optional/CtakesAllTuis.txt



I am running on ubuntu by the way ... anyway under
/home/abarari/Desktop/dictionarytool/output/

there is only

 abarari@ubuntu:~/Desktop/dictionarytool/output$ ls
ctakesicd2015.log  ctakesicd2015.properties  ctakesicd2015.script


where is the database ? am I doing something wrong ? do I need to create
the database before executing the dictionarytool or what ?


I found couple of issues in the dictionary tool, it does not work well with
relative paths.


On Wed, Dec 9, 2015 at 7:11 AM, Pei Chen <ch...@apache.org> wrote:

> Brandon,
> That sounds great!
> Please open a Jira ticket for any contributions (anyone should be able
> to create a Jira account).  There are some legal items built into the
> ASF Jira attachments for accepting contributions/donations.
> It will also credit the contributors with the merit appropriately.
> Anyone who is interested can follow the Jira item. (Even better if
> contributions were open discussion/open development.)
> --Pei
>
> On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
> <bd...@geisinger.edu> wrote:
> > I'd be interested in contributing to making the dictionary tool more
> user friendly with a GUI.
> >
> > Thanks,
> > Brandon
> >
> > -----Original Message-----
> > From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 6:12 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
> >
> > Hi Dave,
> >
> > I'm always happy to see interest in our stuff!
> >
> >>Step 1
> > I built the tool to be able to build a dictionary using anything in the
> umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't
> be a problem.  You just add it to the CtakesSources file (or create an
> alternate file and point to it with -src).  To answer another of your
> questions, there can be zero or more sources - you saw snomedct and
> snomedct_us (each valid in a different umls version).
> > It also can include any semantic type, just add (or remove) the
> appropriate tuis in a different data file.
> >
> >>Step 2
> > You have it right - you copy the templates to another location and
> output to that location.  Otherwise you 'lose' your templates.
> >
> >>Step 3 and 4
> > The jar is built from source.  I need to (soon) check in updates to the
> source, and at the same time I can check in a default prebuilt .jar  The
> lib/ directory is in the source repository.
> >
> > Various people have toyed with the idea of putting the tool into a
> ctakes module, putting it into an "installation package", making a gui ...
> The best option (imo) is probably to make an easy to use gui and keep a
> pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a
> chance to do that ...
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> > Sent: Tuesday, December 08, 2015 4:57 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
> >
> > Thanks, Sean! It's great that cTAKES may soon have an up to date
> database out of the box. Hopefully it will cut down on the need for many to
> build their own DB's. Thank you much for doing that.
> >
> > Unfortunately, I still will need to build a custom one for us. I work in
> veterinary medicine so I need to add in the veterinary extension for
> SNOMED-CT into the database.
> >
> > I looked over the steps below that Brandon included and have some
> questions:
> >
> > step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT"
> to "SNOMEDCT_US". The file that I have has two lines in it. First line is
> SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
> >
> > step 2 should reference the two scripts as being in
> resource/memdbtemplate so others don't have to search for them. Not sure
> what it means to move them to "location to put new UMLS DB". Does that mean
> move them into a new directory where the newly created UMLS DB will get
> written?
> >
> > steps 3 and 4 for running the tools reference dictionarytool.jar which
> doesn't exist. Does one need to build that somehow from the source before
> running it? The command line also adds "lib/*" to the classpath. Is that
> the lib directory inside the dictionarytool source code or some other
> location?
> >
> > What else would I need to do to include the SNOMED-CT Veterinary
> Extension along with the snomedct and rxnorm sources?
> >
> > I'll probably not have time to try this out for a while yet, but when I
> do I'd be happy to write up an easy to follow tutorial for building a
> custom dictionary assuming I am able to get it to work.
> >
> > Has anyone considered making this tool available outside of the source
> code itself? Like including it in the main cTAKES release? It seems there
> is demand for it.
> >
> > - Dave
> >
> > On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
> >
> >> Hi Brandon, thanks for finding and forwarding the instructions!
> >>
> >> I have checked in two new hsqldb dictionaries, both from the 2015AB
> >> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
> >> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term
> mappings.
> >>
> >> One uses cuis filtered by snomed and rxnorm, the other adds cuis
> >> filtered by icd9 and icd10.
> >> What this means:  Cuis that exist for a [filter source] are added to
> >> the dictionary, as are all text variations from all sources that
> >> contain that cui.  Both dictionaries also use the standard ctakes
> >> semantic group tui filters.
> >>
> >> The names are ctakessnorx2015 and ctakesicd2015
> >>
> >> The snomed rxnorm :
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> >> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> >> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> >> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> >> oS1Gav7r2A&e=
> >>
> >> The snomed rxnorm icd9 icd10:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> >> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> >> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> >> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> >> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> >> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> >> w7EdYgKA&e=
> >>
> >> The svn root for the whole ugly thing is:
> >>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >>
> >> Stats:
> >> ctakessnorx2015
> >> 545,913 Terms
> >> 229,251 Concepts (Cuis)
> >> 272,987 Snomed codes
> >> 32,419 Rxnorm codes
> >> 11,321 icd9 codes
> >> 61 icd10 codes
> >>
> >> Ctakesicd2015
> >> 611,230 Terms
> >> 282,211 Concepts
> >> 18,626 icd9 codes
> >> 45,818 icd10 codes
> >> Snomed and Rxnorm counts are the same
> >>
> >> So, adding the icd filters gave us an extra ~53,000 concepts and
> >> ~65,000 terms.
> >>
> >> I would like to move this all to a better root (not
> >> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
> >> write directly in trunk (??) and need to get moving on to other things.
> >>
> >> There is help on the ctakes wiki:
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> >> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> >> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> >> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> >> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> >> Though I should probably add a few items ...
> >>
> >>
> >> Sean
> >>
> >>
> >> -----Original Message-----
> >> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> >> Sent: Tuesday, December 08, 2015 12:51 PM
> >> To: dev@ctakes.apache.org
> >> Subject: RE: ctakes with icd10
> >>
> >> Not to perpetuate the instructions again but I sent these out not long
> >> ago when I was going through the process and Sean was helping me.
> >>
> >>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> >> "SNOMEDCT_US"
> >>         2. Copy ctakesumls.properties and ctakesumls.script from
> >> memdbtemplate to location to put new UMLS DB
> >>         3. Run DictionaryCreator2
> >>         java -cp dictionarytool.jar;lib/*
> >> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> >> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>         4. Run CodeMapCreator
> >>         java -cp dictionarytool.jar;lib/*
> >> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> >> -atui ./data/tiny/CtakesAnatTuis.txt -db
> >> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >>         5. Copy new DB files to new location and create a copy of
> >> cTakesHsql.xml and update dictionary location
> >>
> >> Thanks,
> >> Brandon
> >>
> >> -----Original Message-----
> >> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> >> Sent: Tuesday, December 08, 2015 12:47 PM
> >> To: dev@ctakes.apache.org
> >> Subject: Re: ctakes with icd10
> >>
> >> This seems like a pretty common request and with such an old version
> >> of UMLS database shipped with cTAKES it's only going to get worse.
> >> I've been wanting to build a dictionary using the latest UMLS release
> >> (as well as a custom database), so would be happy to write up the
> >> steps as I go through it. That assumes that I can dig up the
> instructions in the dev list.
> >>
> >> - Dave
> >>
> >> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>
> >> > Hi Alaa,
> >> >
> >> > The -shortest- answer is that you'll need to run the dictionary
> >> > creation tool.  There are instructions in older devlist threads.  By
> >> > default the dictionary creation tool does add icd9 and icd10 tables
> >> > to
> >> the dictionary.
> >> > The problem is that in Umls 2011AB those codes weren't very well
> >> > populated.  The 2015AB icd# set is much more rich so those tables
> >> > should be pretty good.  Then in ctakes you would look up annotations
> >> > by icd9 or icd10 codes instead of by cui:
> >> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
> >> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
> >> > );
> >> >
> >> > Sean
> >> >
> >> > -----Original Message-----
> >> > From: Savova, Guergana
> >> > [mailto:Guergana.Savova@childrens.harvard.edu]
> >> > Sent: Tuesday, December 08, 2015 12:17 PM
> >> > To: dev@ctakes.apache.org
> >> > Subject: RE: ctakes with icd10
> >> >
> >> > Hi Alaa,
> >> > You need to create a resource off the terminology/ontology you want
> >> > to use (in this case ICD9 or ICD10). Then run that resource with
> >> > cTAKES for the fast dictionary lookup. There is cTAKES code and some
> >> > documentation on how to create that resource. By default, cTAKES
> >> > runs with a resource created from the English version of SNOMED CT
> and RxNORM.
> >> > Hope this helps.
> >> > --Guergana
> >> >
> >> > -----Original Message-----
> >> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> >> > Sent: Tuesday, December 8, 2015 10:01 AM
> >> > To: dev@ctakes.apache.org
> >> > Subject: ctakes with icd10
> >> >
> >> > Hi,
> >> >
> >> > I downloaded Latest umls version, and I want to know how to make
> >> > ctakes work with icd10 and icd9.
> >> >
> >> >
> >> > Thanks
> >> >
> >>
> >>
> >> IMPORTANT WARNING: The information in this message (and the documents
> >> attached to it, if any) is confidential and may be legally privileged.
> >> It is intended solely for the addressee. Access to this message by
> >> anyone else is unauthorized. If you are not the intended recipient,
> >> any disclosure, copying, distribution or any action taken, or omitted
> >> to be taken, in reliance on it is prohibited and may be unlawful. If
> >> you have received this message in error, please delete all electronic
> >> copies of this message (and the documents attached to it, if any),
> >> destroy any hard copies you may have created and notify me immediately
> by replying to this email. Thank you.
> >>
> >> Geisinger Health System utilizes an encryption process to safeguard
> >> Protected Health Information and other confidential data contained in
> >> external e-mail messages. If email is encrypted, the recipient will
> >> receive an e-mail instructing them to sign on to the Geisinger Health
> >> System Secure E-mail Message Center to retrieve the encrypted e-mail.
> >>
>



-- 
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Pei Chen <ch...@apache.org>.
Brandon,
That sounds great!
Please open a Jira ticket for any contributions (anyone should be able
to create a Jira account).  There are some legal items built into the
ASF Jira attachments for accepting contributions/donations.
It will also credit the contributors with the merit appropriately.
Anyone who is interested can follow the Jira item. (Even better if
contributions were open discussion/open development.)
--Pei

On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
<bd...@geisinger.edu> wrote:
> I'd be interested in contributing to making the dictionary tool more user friendly with a GUI.
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 6:12 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
>
> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
>>Step 1
> I built the tool to be able to build a dictionary using anything in the umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a problem.  You just add it to the CtakesSources file (or create an alternate file and point to it with -src).  To answer another of your questions, there can be zero or more sources - you saw snomedct and snomedct_us (each valid in a different umls version).
> It also can include any semantic type, just add (or remove) the appropriate tuis in a different data file.
>
>>Step 2
> You have it right - you copy the templates to another location and output to that location.  Otherwise you 'lose' your templates.
>
>>Step 3 and 4
> The jar is built from source.  I need to (soon) check in updates to the source, and at the same time I can check in a default prebuilt .jar  The lib/ directory is in the source repository.
>
> Various people have toyed with the idea of putting the tool into a ctakes module, putting it into an "installation package", making a gui ...  The best option (imo) is probably to make an easy to use gui and keep a pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do that ...
>
> Sean
>
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 4:57 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Thanks, Sean! It's great that cTAKES may soon have an up to date database out of the box. Hopefully it will cut down on the need for many to build their own DB's. Thank you much for doing that.
>
> Unfortunately, I still will need to build a custom one for us. I work in veterinary medicine so I need to add in the veterinary extension for SNOMED-CT into the database.
>
> I looked over the steps below that Brandon included and have some questions:
>
> step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>
> step 2 should reference the two scripts as being in resource/memdbtemplate so others don't have to search for them. Not sure what it means to move them to "location to put new UMLS DB". Does that mean move them into a new directory where the newly created UMLS DB will get written?
>
> steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't exist. Does one need to build that somehow from the source before running it? The command line also adds "lib/*" to the classpath. Is that the lib directory inside the dictionarytool source code or some other location?
>
> What else would I need to do to include the SNOMED-CT Veterinary Extension along with the snomedct and rxnorm sources?
>
> I'll probably not have time to try this out for a while yet, but when I do I'd be happy to write up an easy to follow tutorial for building a custom dictionary assuming I am able to get it to work.
>
> Has anyone considered making this tool available outside of the source code itself? Like including it in the main cTAKES release? It seems there is demand for it.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:
>
>> Hi Brandon, thanks for finding and forwarding the instructions!
>>
>> I have checked in two new hsqldb dictionaries, both from the 2015AB
>> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
>> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>>
>> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>> filtered by icd9 and icd10.
>> What this means:  Cuis that exist for a [filter source] are added to
>> the dictionary, as are all text variations from all sources that
>> contain that cui.  Both dictionaries also use the standard ctakes
>> semantic group tui filters.
>>
>> The names are ctakessnorx2015 and ctakesicd2015
>>
>> The snomed rxnorm :
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
>> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
>> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
>> oS1Gav7r2A&e=
>>
>> The snomed rxnorm icd9 icd10:
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
>> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
>> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
>> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
>> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
>> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
>> w7EdYgKA&e=
>>
>> The svn root for the whole ugly thing is:
>>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>>
>> Stats:
>> ctakessnorx2015
>> 545,913 Terms
>> 229,251 Concepts (Cuis)
>> 272,987 Snomed codes
>> 32,419 Rxnorm codes
>> 11,321 icd9 codes
>> 61 icd10 codes
>>
>> Ctakesicd2015
>> 611,230 Terms
>> 282,211 Concepts
>> 18,626 icd9 codes
>> 45,818 icd10 codes
>> Snomed and Rxnorm counts are the same
>>
>> So, adding the icd filters gave us an extra ~53,000 concepts and
>> ~65,000 terms.
>>
>> I would like to move this all to a better root (not
>> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
>> write directly in trunk (??) and need to get moving on to other things.
>>
>> There is help on the ctakes wiki:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
>> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
>> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
>> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
>> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
>> Though I should probably add a few items ...
>>
>>
>> Sean
>>
>>
>> -----Original Message-----
>> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
>> Sent: Tuesday, December 08, 2015 12:51 PM
>> To: dev@ctakes.apache.org
>> Subject: RE: ctakes with icd10
>>
>> Not to perpetuate the instructions again but I sent these out not long
>> ago when I was going through the process and Sean was helping me.
>>
>>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
>> "SNOMEDCT_US"
>>         2. Copy ctakesumls.properties and ctakesumls.script from
>> memdbtemplate to location to put new UMLS DB
>>         3. Run DictionaryCreator2
>>         java -cp dictionarytool.jar;lib/*
>> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
>> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
>> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>         4. Run CodeMapCreator
>>         java -cp dictionarytool.jar;lib/*
>> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
>> -atui ./data/tiny/CtakesAnatTuis.txt -db
>> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>>         5. Copy new DB files to new location and create a copy of
>> cTakesHsql.xml and update dictionary location
>>
>> Thanks,
>> Brandon
>>
>> -----Original Message-----
>> From: David Kincaid [mailto:kincaid.dave@gmail.com]
>> Sent: Tuesday, December 08, 2015 12:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Re: ctakes with icd10
>>
>> This seems like a pretty common request and with such an old version
>> of UMLS database shipped with cTAKES it's only going to get worse.
>> I've been wanting to build a dictionary using the latest UMLS release
>> (as well as a custom database), so would be happy to write up the
>> steps as I go through it. That assumes that I can dig up the instructions in the dev list.
>>
>> - Dave
>>
>> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>>
>> > Hi Alaa,
>> >
>> > The -shortest- answer is that you'll need to run the dictionary
>> > creation tool.  There are instructions in older devlist threads.  By
>> > default the dictionary creation tool does add icd9 and icd10 tables
>> > to
>> the dictionary.
>> > The problem is that in Umls 2011AB those codes weren't very well
>> > populated.  The 2015AB icd# set is much more rich so those tables
>> > should be pretty good.  Then in ctakes you would look up annotations
>> > by icd9 or icd10 codes instead of by cui:
>> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
>> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
>> > );
>> >
>> > Sean
>> >
>> > -----Original Message-----
>> > From: Savova, Guergana
>> > [mailto:Guergana.Savova@childrens.harvard.edu]
>> > Sent: Tuesday, December 08, 2015 12:17 PM
>> > To: dev@ctakes.apache.org
>> > Subject: RE: ctakes with icd10
>> >
>> > Hi Alaa,
>> > You need to create a resource off the terminology/ontology you want
>> > to use (in this case ICD9 or ICD10). Then run that resource with
>> > cTAKES for the fast dictionary lookup. There is cTAKES code and some
>> > documentation on how to create that resource. By default, cTAKES
>> > runs with a resource created from the English version of SNOMED CT and RxNORM.
>> > Hope this helps.
>> > --Guergana
>> >
>> > -----Original Message-----
>> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
>> > Sent: Tuesday, December 8, 2015 10:01 AM
>> > To: dev@ctakes.apache.org
>> > Subject: ctakes with icd10
>> >
>> > Hi,
>> >
>> > I downloaded Latest umls version, and I want to know how to make
>> > ctakes work with icd10 and icd9.
>> >
>> >
>> > Thanks
>> >
>>
>>
>> IMPORTANT WARNING: The information in this message (and the documents
>> attached to it, if any) is confidential and may be legally privileged.
>> It is intended solely for the addressee. Access to this message by
>> anyone else is unauthorized. If you are not the intended recipient,
>> any disclosure, copying, distribution or any action taken, or omitted
>> to be taken, in reliance on it is prohibited and may be unlawful. If
>> you have received this message in error, please delete all electronic
>> copies of this message (and the documents attached to it, if any),
>> destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.
>>
>> Geisinger Health System utilizes an encryption process to safeguard
>> Protected Health Information and other confidential data contained in
>> external e-mail messages. If email is encrypted, the recipient will
>> receive an e-mail instructing them to sign on to the Geisinger Health
>> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>>

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Geise, Brandon D." <bd...@geisinger.edu>.
I'd be interested in contributing to making the dictionary tool more user friendly with a GUI.

Thanks,
Brandon

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Tuesday, December 08, 2015 6:12 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!

Hi Dave,

I'm always happy to see interest in our stuff!

>Step 1
I built the tool to be able to build a dictionary using anything in the umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a problem.  You just add it to the CtakesSources file (or create an alternate file and point to it with -src).  To answer another of your questions, there can be zero or more sources - you saw snomedct and snomedct_us (each valid in a different umls version).  
It also can include any semantic type, just add (or remove) the appropriate tuis in a different data file.

>Step 2
You have it right - you copy the templates to another location and output to that location.  Otherwise you 'lose' your templates.

>Step 3 and 4
The jar is built from source.  I need to (soon) check in updates to the source, and at the same time I can check in a default prebuilt .jar  The lib/ directory is in the source repository.

Various people have toyed with the idea of putting the tool into a ctakes module, putting it into an "installation package", making a gui ...  The best option (imo) is probably to make an easy to use gui and keep a pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do that ...

Sean


-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com]
Sent: Tuesday, December 08, 2015 4:57 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thanks, Sean! It's great that cTAKES may soon have an up to date database out of the box. Hopefully it will cut down on the need for many to build their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in veterinary medicine so I need to add in the veterinary extension for SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate so others don't have to search for them. Not sure what it means to move them to "location to put new UMLS DB". Does that mean move them into a new directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't exist. Does one need to build that somehow from the source before running it? The command line also adds "lib/*" to the classpath. Is that the lib directory inside the dictionarytool source code or some other location?

What else would I need to do to include the SNOMED-CT Veterinary Extension along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do I'd be happy to write up an easy to follow tutorial for building a custom dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code itself? Like including it in the main cTAKES release? It seems there is demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB 
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> filtered by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to 
> the dictionary, as are all text variations from all sources that 
> contain that cui.  Both dictionaries also use the standard ctakes 
> semantic group tui filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> oS1Gav7r2A&e=
>
> The snomed rxnorm icd9 icd10:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> w7EdYgKA&e=
>
> The svn root for the whole ugly thing is:
>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and
> ~65,000 terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to 
> write directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long 
> ago when I was going through the process and Sean was helping me.
>
>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
> "SNOMEDCT_US"
>         2. Copy ctakesumls.properties and ctakesumls.script from 
> memdbtemplate to location to put new UMLS DB
>         3. Run DictionaryCreator2
>         java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         4. Run CodeMapCreator
>         java -cp dictionarytool.jar;lib/* 
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         5. Copy new DB files to new location and create a copy of 
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 12:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10
>
> This seems like a pretty common request and with such an old version 
> of UMLS database shipped with cTAKES it's only going to get worse.
> I've been wanting to build a dictionary using the latest UMLS release 
> (as well as a custom database), so would be happy to write up the 
> steps as I go through it. That assumes that I can dig up the instructions in the dev list.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Alaa,
> >
> > The -shortest- answer is that you'll need to run the dictionary 
> > creation tool.  There are instructions in older devlist threads.  By 
> > default the dictionary creation tool does add icd9 and icd10 tables 
> > to
> the dictionary.
> > The problem is that in Umls 2011AB those codes weren't very well 
> > populated.  The 2015AB icd# set is much more rich so those tables 
> > should be pretty good.  Then in ctakes you would look up annotations 
> > by icd9 or icd10 codes instead of by cui:
> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code 
> > );
> >
> > Sean
> >
> > -----Original Message-----
> > From: Savova, Guergana
> > [mailto:Guergana.Savova@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 12:17 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Hi Alaa,
> > You need to create a resource off the terminology/ontology you want 
> > to use (in this case ICD9 or ICD10). Then run that resource with 
> > cTAKES for the fast dictionary lookup. There is cTAKES code and some 
> > documentation on how to create that resource. By default, cTAKES 
> > runs with a resource created from the English version of SNOMED CT and RxNORM.
> > Hope this helps.
> > --Guergana
> >
> > -----Original Message-----
> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > Sent: Tuesday, December 8, 2015 10:01 AM
> > To: dev@ctakes.apache.org
> > Subject: ctakes with icd10
> >
> > Hi,
> >
> > I downloaded Latest umls version, and I want to know how to make 
> > ctakes work with icd10 and icd9.
> >
> >
> > Thanks
> >
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged.
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Alaa,

I have a slightly updated version of the dictionary tool - only a couple of changes but I should check them in nonetheless after I've cleaned up a bit.  

I followed the process as emailed by Brandon Geise around 12:51 today.  My command parameters were:
[DictionaryCreator2]
 -umls C:\Spiffy\umls\data\external\2015AB\META
-db jdbc:hsqldb:file:C:/Spiffy/rword_dict/output/umls2015icd_hsql/ctakesicd2015
-tbl CUI_TERMS
-fd ./data/tiny
-src ./data/tiny/CtakesSources.txt
-atui ./data/tiny/CtakesAnatTuis.txt
-tui ./data/tiny/CtakesSnomedTuis.txt

And I added ICD9CM and ICD10PCS to CtakesSources.txt

[CodeMapCreator]
-umls C:\Spiffy\umls\data\external\2015AB\META
-db jdbc:hsqldb:file:C:/Spiffy/rword_dict/output/umls2015icd_hsql/ctakesicd2015
-tbl kludge
-fd ./data/tiny
-src ./data/tiny/CtakesSources.txt

Obviously C:/Spiffy/ is my root of all evil ;^)


The .script and .properties files constitute the hsql database - which holds the dictionary.  You can copy them into the ctakes root resources/ directory parallel to the existing ctakessnorx/ directory:
[CTAKES_ROOT]/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

Then, in ctakes-dictionary-fast-res, edit your cTakesHsql.xml file:
[CTAKES_ROOT]/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml

Change both entries of  value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx"/>
to                      value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ctakesicd2015"/>


Then get rid of the lines that look like
            <property key="snomedTable" value="snomedct"/>
            <property key="rxnormTable" value="rxnorm"/>
            <property key="icd9Table" value="icd9cm"/>
            <property key="icd10Table" value="icd10pcs"/>
And replace them with
            <property key="snomedct_usTable" value="long"/>
            <property key="rxnormTable" value="text"/>
            <property key="icd9cmTable" value="text"/>
            <property key="icd10pcsTable" value="text"/>


To get icd# related stuff, it might be easiest to use org.apache.ctakes.core.util.OntologyConceptUtil
getCodes( IdentifiedAnnotation, [schemeName] ) will return for an annotation all of the codes for the schema name.
getCodes( IdentifiedAnnotations, [schemeName] ) as above, but for all given annotations
getCodes( JCas, [schemeName] ) as above, but for all annotations in cas
getCodes( JCas, [lookupWindow], [schemeName] ) as above, but for all annotations in the lookup window
getCodes( IdentifiedAnnotation ) is like that above but returns all codes for all schema.
getCodes( IdentifiedAnnotations ) as above, but for all given annotations
getCodes( JCas, [lookupWindow] ) as above, but for all annotations in the lookup window

getSchemeCodes( IdentifiedAnnotation ) will return a hashtable of all the codes related to the annotation.  The keys of the hashtable are the schema names (icd9cm, icd10pcs, etc.) and the values are lists of all the codes in the schema.
getSchemeCodes( IdentifiedAnnotations ) as above, but for all given annotations
getSchemeCodes( JCas ) will return a hashtable with all codes in the cas -  useful if you are just looking for existence.

getAnnotationsByCode( JCas, [code] ) returns all annotations in the cas with the given code
getAnnotationsByCode( JCas, [lookupWindow], [code] ) as above, but in lookup window
getAnnotationsByCode( IdentifiedAnnotations, [code] ) as above, but in annotation collection

So, you could use something like:
getCodes( JCas, "ICD10PCS" ) to get all the icd10 codes found in the document.  For codes of interest, use
getAnnnotationsByCode( JCas, [code] ) to get all the annotations in the document with that code.

I know that is a lot to go over at once, and I am skimming the surface a bit, but I hope that it helps.

Must run,
Sean



-----Original Message-----
From: Alaa al Barari [mailto:alaa.albarari@gmail.com] 
Sent: Tuesday, December 08, 2015 6:49 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thank you very very much Finan,

I am still very nooob to ctakes so please bare with me.

1- Could you please post detailed instructions on how you built the dictionaries ? or give as much as you can examples on the steps ?
2- what did you upload exactly, I only see a script and properties files what are those ? and what I need to change in c takes to make it work with them. like how to get icd10 codes ?

I am sorry for being noob, hope soon I will understand the whole thing and be effective.

Thanks in advance

On Wed, Dec 9, 2015 at 1:12 AM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
> >Step 1
> I built the tool to be able to build a dictionary using anything in 
> the umls - snomed, icd9, hpo, etc. so using the veterinary extension 
> shouldn't be a problem.  You just add it to the CtakesSources file (or 
> create an alternate file and point to it with -src).  To answer 
> another of your questions, there can be zero or more sources - you saw 
> snomedct and snomedct_us (each valid in a different umls version).
> It also can include any semantic type, just add (or remove) the 
> appropriate tuis in a different data file.
>
> >Step 2
> You have it right - you copy the templates to another location and 
> output to that location.  Otherwise you 'lose' your templates.
>
> >Step 3 and 4
> The jar is built from source.  I need to (soon) check in updates to 
> the source, and at the same time I can check in a default prebuilt 
> .jar  The lib/ directory is in the source repository.
>
> Various people have toyed with the idea of putting the tool into a 
> ctakes module, putting it into an "installation package", making a gui 
> ...  The best option (imo) is probably to make an easy to use gui and 
> keep a pre-built version in sandbox.  Someday, after the rainbow, 
> maybe I'll get a chance to do that ...
>
> Sean
>
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 4:57 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Thanks, Sean! It's great that cTAKES may soon have an up to date 
> database out of the box. Hopefully it will cut down on the need for 
> many to build their own DB's. Thank you much for doing that.
>
> Unfortunately, I still will need to build a custom one for us. I work 
> in veterinary medicine so I need to add in the veterinary extension 
> for SNOMED-CT into the database.
>
> I looked over the steps below that Brandon included and have some
> questions:
>
> step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" 
> to "SNOMEDCT_US". The file that I have has two lines in it. First line 
> is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>
> step 2 should reference the two scripts as being in 
> resource/memdbtemplate so others don't have to search for them. Not 
> sure what it means to move them to "location to put new UMLS DB". Does 
> that mean move them into a new directory where the newly created UMLS DB will get written?
>
> steps 3 and 4 for running the tools reference dictionarytool.jar which 
> doesn't exist. Does one need to build that somehow from the source 
> before running it? The command line also adds "lib/*" to the 
> classpath. Is that the lib directory inside the dictionarytool source 
> code or some other location?
>
> What else would I need to do to include the SNOMED-CT Veterinary 
> Extension along with the snomedct and rxnorm sources?
>
> I'll probably not have time to try this out for a while yet, but when 
> I do I'd be happy to write up an easy to follow tutorial for building 
> a custom dictionary assuming I am able to get it to work.
>
> Has anyone considered making this tool available outside of the source 
> code itself? Like including it in the main cTAKES release? It seems 
> there is demand for it.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Brandon, thanks for finding and forwarding the instructions!
> >
> > I have checked in two new hsqldb dictionaries, both from the 2015AB 
> > version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> > icd9cm and icd10pcs - as well as the usual cui, tui, preferred term
> mappings.
> >
> > One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> > filtered by icd9 and icd10.
> > What this means:  Cuis that exist for a [filter source] are added to 
> > the dictionary, as are all text variations from all sources that 
> > contain that cui.  Both dictionaries also use the standard ctakes 
> > semantic group tui filters.
> >
> > The names are ctakessnorx2015 and ctakesicd2015
> >
> > The snomed rxnorm :
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_
> > p_ 
> > ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Dr
> > wo 
> > rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary
> > _l 
> > ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > ZM 
> > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3
> > Fm 
> > uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gpp
> > CR
> > oS1Gav7r2A&e=
> >
> > The snomed rxnorm icd9 icd10:
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_
> > p_ 
> > ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Dr
> > wo 
> > rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary
> > _l 
> > ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> > Sd 
> > ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> > uU 
> > Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39
> > Tw
> > w7EdYgKA&e=
> >
> > The svn root for the whole ugly thing is:
> >  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >
> > Stats:
> > ctakessnorx2015
> > 545,913 Terms
> > 229,251 Concepts (Cuis)
> > 272,987 Snomed codes
> > 32,419 Rxnorm codes
> > 11,321 icd9 codes
> > 61 icd10 codes
> >
> > Ctakesicd2015
> > 611,230 Terms
> > 282,211 Concepts
> > 18,626 icd9 codes
> > 45,818 icd10 codes
> > Snomed and Rxnorm counts are the same
> >
> > So, adding the icd filters gave us an extra ~53,000 concepts and
> > ~65,000 terms.
> >
> > I would like to move this all to a better root (not
> > ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to 
> > write directly in trunk (??) and need to get moving on to other things.
> >
> > There is help on the ctakes wiki:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.or
> > g_ 
> > confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BL
> > oo 
> > kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67Gvl
> > GZ
> > stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL
> > 53 DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> > Though I should probably add a few items ...
> >
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> > Sent: Tuesday, December 08, 2015 12:51 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Not to perpetuate the instructions again but I sent these out not 
> > long ago when I was going through the process and Sean was helping me.
> >
> >         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
> > "SNOMEDCT_US"
> >         2. Copy ctakesumls.properties and ctakesumls.script from 
> > memdbtemplate to location to put new UMLS DB
> >         3. Run DictionaryCreator2
> >         java -cp dictionarytool.jar;lib/*
> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> > "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> > jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >         4. Run CodeMapCreator
> >         java -cp dictionarytool.jar;lib/* 
> > org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> > -atui ./data/tiny/CtakesAnatTuis.txt -db
> > jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >         5. Copy new DB files to new location and create a copy of 
> > cTakesHsql.xml and update dictionary location
> >
> > Thanks,
> > Brandon
> >
> > -----Original Message-----
> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> > Sent: Tuesday, December 08, 2015 12:47 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: ctakes with icd10
> >
> > This seems like a pretty common request and with such an old version 
> > of UMLS database shipped with cTAKES it's only going to get worse.
> > I've been wanting to build a dictionary using the latest UMLS 
> > release (as well as a custom database), so would be happy to write 
> > up the steps as I go through it. That assumes that I can dig up the
> instructions in the dev list.
> >
> > - Dave
> >
> > On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Alaa,
> > >
> > > The -shortest- answer is that you'll need to run the dictionary 
> > > creation tool.  There are instructions in older devlist threads.  
> > > By default the dictionary creation tool does add icd9 and icd10 
> > > tables to
> > the dictionary.
> > > The problem is that in Umls 2011AB those codes weren't very well 
> > > populated.  The 2015AB icd# set is much more rich so those tables 
> > > should be pretty good.  Then in ctakes you would look up 
> > > annotations by icd9 or icd10 codes instead of by cui:
> > > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
> > > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, 
> > > icd#Code );
> > >
> > > Sean
> > >
> > > -----Original Message-----
> > > From: Savova, Guergana
> > > [mailto:Guergana.Savova@childrens.harvard.edu]
> > > Sent: Tuesday, December 08, 2015 12:17 PM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: ctakes with icd10
> > >
> > > Hi Alaa,
> > > You need to create a resource off the terminology/ontology you 
> > > want to use (in this case ICD9 or ICD10). Then run that resource 
> > > with cTAKES for the fast dictionary lookup. There is cTAKES code 
> > > and some documentation on how to create that resource. By default, 
> > > cTAKES runs with a resource created from the English version of 
> > > SNOMED CT and
> RxNORM.
> > > Hope this helps.
> > > --Guergana
> > >
> > > -----Original Message-----
> > > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > > Sent: Tuesday, December 8, 2015 10:01 AM
> > > To: dev@ctakes.apache.org
> > > Subject: ctakes with icd10
> > >
> > > Hi,
> > >
> > > I downloaded Latest umls version, and I want to know how to make 
> > > ctakes work with icd10 and icd9.
> > >
> > >
> > > Thanks
> > >
> >
> >
> > IMPORTANT WARNING: The information in this message (and the 
> > documents attached to it, if any) is confidential and may be legally privileged.
> > It is intended solely for the addressee. Access to this message by 
> > anyone else is unauthorized. If you are not the intended recipient, 
> > any disclosure, copying, distribution or any action taken, or 
> > omitted to be taken, in reliance on it is prohibited and may be 
> > unlawful. If you have received this message in error, please delete 
> > all electronic copies of this message (and the documents attached to 
> > it, if any), destroy any hard copies you may have created and notify 
> > me immediately
> by replying to this email. Thank you.
> >
> > Geisinger Health System utilizes an encryption process to safeguard 
> > Protected Health Information and other confidential data contained 
> > in external e-mail messages. If email is encrypted, the recipient 
> > will receive an e-mail instructing them to sign on to the Geisinger 
> > Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
> >
>



--
Eng Alaa Al-Barari
phone 0599297470

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by Alaa al Barari <al...@gmail.com>.
Thank you very very much Finan,

I am still very nooob to ctakes so please bare with me.

1- Could you please post detailed instructions on how you built the
dictionaries ? or give as much as you can examples on the steps ?
2- what did you upload exactly, I only see a script and properties files
what are those ? and what I need to change in c takes to make it work with
them. like how to get icd10 codes ?

I am sorry for being noob, hope soon I will understand the whole thing and
be effective.

Thanks in advance

On Wed, Dec 9, 2015 at 1:12 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
> >Step 1
> I built the tool to be able to build a dictionary using anything in the
> umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't
> be a problem.  You just add it to the CtakesSources file (or create an
> alternate file and point to it with -src).  To answer another of your
> questions, there can be zero or more sources - you saw snomedct and
> snomedct_us (each valid in a different umls version).
> It also can include any semantic type, just add (or remove) the
> appropriate tuis in a different data file.
>
> >Step 2
> You have it right - you copy the templates to another location and output
> to that location.  Otherwise you 'lose' your templates.
>
> >Step 3 and 4
> The jar is built from source.  I need to (soon) check in updates to the
> source, and at the same time I can check in a default prebuilt .jar  The
> lib/ directory is in the source repository.
>
> Various people have toyed with the idea of putting the tool into a ctakes
> module, putting it into an "installation package", making a gui ...  The
> best option (imo) is probably to make an easy to use gui and keep a
> pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a
> chance to do that ...
>
> Sean
>
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 4:57 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Thanks, Sean! It's great that cTAKES may soon have an up to date database
> out of the box. Hopefully it will cut down on the need for many to build
> their own DB's. Thank you much for doing that.
>
> Unfortunately, I still will need to build a custom one for us. I work in
> veterinary medicine so I need to add in the veterinary extension for
> SNOMED-CT into the database.
>
> I looked over the steps below that Brandon included and have some
> questions:
>
> step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> "SNOMEDCT_US". The file that I have has two lines in it. First line is
> SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>
> step 2 should reference the two scripts as being in resource/memdbtemplate
> so others don't have to search for them. Not sure what it means to move
> them to "location to put new UMLS DB". Does that mean move them into a new
> directory where the newly created UMLS DB will get written?
>
> steps 3 and 4 for running the tools reference dictionarytool.jar which
> doesn't exist. Does one need to build that somehow from the source before
> running it? The command line also adds "lib/*" to the classpath. Is that
> the lib directory inside the dictionarytool source code or some other
> location?
>
> What else would I need to do to include the SNOMED-CT Veterinary Extension
> along with the snomedct and rxnorm sources?
>
> I'll probably not have time to try this out for a while yet, but when I do
> I'd be happy to write up an easy to follow tutorial for building a custom
> dictionary assuming I am able to get it to work.
>
> Has anyone considered making this tool available outside of the source
> code itself? Like including it in the main cTAKES release? It seems there
> is demand for it.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Brandon, thanks for finding and forwarding the instructions!
> >
> > I have checked in two new hsqldb dictionaries, both from the 2015AB
> > version of the UMLS.  They both have codes for snomedct_us, rxnorm,
> > icd9cm and icd10pcs - as well as the usual cui, tui, preferred term
> mappings.
> >
> > One uses cuis filtered by snomed and rxnorm, the other adds cuis
> > filtered by icd9 and icd10.
> > What this means:  Cuis that exist for a [filter source] are added to
> > the dictionary, as are all text variations from all sources that
> > contain that cui.  Both dictionaries also use the standard ctakes
> > semantic group tui filters.
> >
> > The names are ctakessnorx2015 and ctakesicd2015
> >
> > The snomed rxnorm :
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> > ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> > rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> > ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> > uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> > oS1Gav7r2A&e=
> >
> > The snomed rxnorm icd9 icd10:
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> > ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> > rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> > ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> > ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> > Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> > w7EdYgKA&e=
> >
> > The svn root for the whole ugly thing is:
> >  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
> >
> > Stats:
> > ctakessnorx2015
> > 545,913 Terms
> > 229,251 Concepts (Cuis)
> > 272,987 Snomed codes
> > 32,419 Rxnorm codes
> > 11,321 icd9 codes
> > 61 icd10 codes
> >
> > Ctakesicd2015
> > 611,230 Terms
> > 282,211 Concepts
> > 18,626 icd9 codes
> > 45,818 icd10 codes
> > Snomed and Rxnorm counts are the same
> >
> > So, adding the icd filters gave us an extra ~53,000 concepts and
> > ~65,000 terms.
> >
> > I would like to move this all to a better root (not
> > ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to
> > write directly in trunk (??) and need to get moving on to other things.
> >
> > There is help on the ctakes wiki:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> > confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> > kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> > stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> > DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> > Though I should probably add a few items ...
> >
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> > Sent: Tuesday, December 08, 2015 12:51 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Not to perpetuate the instructions again but I sent these out not long
> > ago when I was going through the process and Sean was helping me.
> >
> >         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> > "SNOMEDCT_US"
> >         2. Copy ctakesumls.properties and ctakesumls.script from
> > memdbtemplate to location to put new UMLS DB
> >         3. Run DictionaryCreator2
> >         java -cp dictionarytool.jar;lib/*
> > org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> > "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> > jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >         4. Run CodeMapCreator
> >         java -cp dictionarytool.jar;lib/*
> > org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> > -atui ./data/tiny/CtakesAnatTuis.txt -db
> > jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> >         5. Copy new DB files to new location and create a copy of
> > cTakesHsql.xml and update dictionary location
> >
> > Thanks,
> > Brandon
> >
> > -----Original Message-----
> > From: David Kincaid [mailto:kincaid.dave@gmail.com]
> > Sent: Tuesday, December 08, 2015 12:47 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: ctakes with icd10
> >
> > This seems like a pretty common request and with such an old version
> > of UMLS database shipped with cTAKES it's only going to get worse.
> > I've been wanting to build a dictionary using the latest UMLS release
> > (as well as a custom database), so would be happy to write up the
> > steps as I go through it. That assumes that I can dig up the
> instructions in the dev list.
> >
> > - Dave
> >
> > On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Alaa,
> > >
> > > The -shortest- answer is that you'll need to run the dictionary
> > > creation tool.  There are instructions in older devlist threads.  By
> > > default the dictionary creation tool does add icd9 and icd10 tables
> > > to
> > the dictionary.
> > > The problem is that in Umls 2011AB those codes weren't very well
> > > populated.  The 2015AB icd# set is much more rich so those tables
> > > should be pretty good.  Then in ctakes you would look up annotations
> > > by icd9 or icd10 codes instead of by cui:
> > > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow,
> > > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code
> > > );
> > >
> > > Sean
> > >
> > > -----Original Message-----
> > > From: Savova, Guergana
> > > [mailto:Guergana.Savova@childrens.harvard.edu]
> > > Sent: Tuesday, December 08, 2015 12:17 PM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: ctakes with icd10
> > >
> > > Hi Alaa,
> > > You need to create a resource off the terminology/ontology you want
> > > to use (in this case ICD9 or ICD10). Then run that resource with
> > > cTAKES for the fast dictionary lookup. There is cTAKES code and some
> > > documentation on how to create that resource. By default, cTAKES
> > > runs with a resource created from the English version of SNOMED CT and
> RxNORM.
> > > Hope this helps.
> > > --Guergana
> > >
> > > -----Original Message-----
> > > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > > Sent: Tuesday, December 8, 2015 10:01 AM
> > > To: dev@ctakes.apache.org
> > > Subject: ctakes with icd10
> > >
> > > Hi,
> > >
> > > I downloaded Latest umls version, and I want to know how to make
> > > ctakes work with icd10 and icd9.
> > >
> > >
> > > Thanks
> > >
> >
> >
> > IMPORTANT WARNING: The information in this message (and the documents
> > attached to it, if any) is confidential and may be legally privileged.
> > It is intended solely for the addressee. Access to this message by
> > anyone else is unauthorized. If you are not the intended recipient,
> > any disclosure, copying, distribution or any action taken, or omitted
> > to be taken, in reliance on it is prohibited and may be unlawful. If
> > you have received this message in error, please delete all electronic
> > copies of this message (and the documents attached to it, if any),
> > destroy any hard copies you may have created and notify me immediately
> by replying to this email. Thank you.
> >
> > Geisinger Health System utilizes an encryption process to safeguard
> > Protected Health Information and other confidential data contained in
> > external e-mail messages. If email is encrypted, the recipient will
> > receive an e-mail instructing them to sign on to the Geisinger Health
> > System Secure E-mail Message Center to retrieve the encrypted e-mail.
> >
>



-- 
Eng Alaa Al-Barari
phone 0599297470

RE: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Dave,

I'm always happy to see interest in our stuff!

>Step 1
I built the tool to be able to build a dictionary using anything in the umls - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a problem.  You just add it to the CtakesSources file (or create an alternate file and point to it with -src).  To answer another of your questions, there can be zero or more sources - you saw snomedct and snomedct_us (each valid in a different umls version).  
It also can include any semantic type, just add (or remove) the appropriate tuis in a different data file.

>Step 2
You have it right - you copy the templates to another location and output to that location.  Otherwise you 'lose' your templates.

>Step 3 and 4
The jar is built from source.  I need to (soon) check in updates to the source, and at the same time I can check in a default prebuilt .jar  The lib/ directory is in the source repository.

Various people have toyed with the idea of putting the tool into a ctakes module, putting it into an "installation package", making a gui ...  The best option (imo) is probably to make an easy to use gui and keep a pre-built version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do that ...

Sean


-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com] 
Sent: Tuesday, December 08, 2015 4:57 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thanks, Sean! It's great that cTAKES may soon have an up to date database out of the box. Hopefully it will cut down on the need for many to build their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in veterinary medicine so I need to add in the veterinary extension for SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to "SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate so others don't have to search for them. Not sure what it means to move them to "location to put new UMLS DB". Does that mean move them into a new directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't exist. Does one need to build that somehow from the source before running it? The command line also adds "lib/*" to the classpath. Is that the lib directory inside the dictionarytool source code or some other location?

What else would I need to do to include the SNOMED-CT Veterinary Extension along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do I'd be happy to write up an easy to follow tutorial for building a custom dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code itself? Like including it in the main cTAKES release? It seems there is demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB 
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> filtered by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to 
> the dictionary, as are all text variations from all sources that 
> contain that cui.  Both dictionaries also use the standard ctakes 
> semantic group tui filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakessnorx2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3Fm
> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> oS1Gav7r2A&e=
>
> The snomed rxnorm icd9 icd10:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakesicd2015_&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> ioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuU
> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c&s=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> w7EdYgKA&e=
>
> The svn root for the whole ugly thing is:
>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and 
> ~65,000 terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to 
> write directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_
> confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLoo
> kup&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZ
> stTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=SRqwsl3FmuUXq77GmVlfXn0lE0pVRkL53
> DNhukcaW6c&s=98W_vAHGZ2FLEMPfrSgEHtZt-mQ3XJjF6yQYM26tqP4&e=
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long 
> ago when I was going through the process and Sean was helping me.
>
>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
> "SNOMEDCT_US"
>         2. Copy ctakesumls.properties and ctakesumls.script from 
> memdbtemplate to location to put new UMLS DB
>         3. Run DictionaryCreator2
>         java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls 
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         4. Run CodeMapCreator
>         java -cp dictionarytool.jar;lib/* 
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         5. Copy new DB files to new location and create a copy of 
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 12:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10
>
> This seems like a pretty common request and with such an old version 
> of UMLS database shipped with cTAKES it's only going to get worse. 
> I've been wanting to build a dictionary using the latest UMLS release 
> (as well as a custom database), so would be happy to write up the 
> steps as I go through it. That assumes that I can dig up the instructions in the dev list.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Alaa,
> >
> > The -shortest- answer is that you'll need to run the dictionary 
> > creation tool.  There are instructions in older devlist threads.  By 
> > default the dictionary creation tool does add icd9 and icd10 tables 
> > to
> the dictionary.
> > The problem is that in Umls 2011AB those codes weren't very well 
> > populated.  The 2015AB icd# set is much more rich so those tables 
> > should be pretty good.  Then in ctakes you would look up annotations 
> > by icd9 or icd10 codes instead of by cui:
> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, 
> > icd#Code ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code 
> > );
> >
> > Sean
> >
> > -----Original Message-----
> > From: Savova, Guergana 
> > [mailto:Guergana.Savova@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 12:17 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Hi Alaa,
> > You need to create a resource off the terminology/ontology you want 
> > to use (in this case ICD9 or ICD10). Then run that resource with 
> > cTAKES for the fast dictionary lookup. There is cTAKES code and some 
> > documentation on how to create that resource. By default, cTAKES 
> > runs with a resource created from the English version of SNOMED CT and RxNORM.
> > Hope this helps.
> > --Guergana
> >
> > -----Original Message-----
> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > Sent: Tuesday, December 8, 2015 10:01 AM
> > To: dev@ctakes.apache.org
> > Subject: ctakes with icd10
> >
> > Hi,
> >
> > I downloaded Latest umls version, and I want to know how to make 
> > ctakes work with icd10 and icd9.
> >
> >
> > Thanks
> >
>
>
> IMPORTANT WARNING: The information in this message (and the documents 
> attached to it, if any) is confidential and may be legally privileged. 
> It is intended solely for the addressee. Access to this message by 
> anyone else is unauthorized. If you are not the intended recipient, 
> any disclosure, copying, distribution or any action taken, or omitted 
> to be taken, in reliance on it is prohibited and may be unlawful. If 
> you have received this message in error, please delete all electronic 
> copies of this message (and the documents attached to it, if any), 
> destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard 
> Protected Health Information and other confidential data contained in 
> external e-mail messages. If email is encrypted, the recipient will 
> receive an e-mail instructing them to sign on to the Geisinger Health 
> System Secure E-mail Message Center to retrieve the encrypted e-mail.
>

Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by "Geise, Brandon D." <bd...@geisinger.edu>.
Hi David,

For step 1 the file may have been updated. At the time I wrote the instructions only SNOMED was listed in the file.

For step 2 yes, this is a new directory location where the new dictionary will be created.

Yes you need to build or use the one Sean built and has in the repo.

Sorry the instructions aren't clear. They seemed to make sense when I wrote them at the time.

Hope that helps,
Brandon


_____________________________
From: David Kincaid <ki...@gmail.com>>
Sent: Tuesday, December 8, 2015 4:57 PM
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
To: <de...@ctakes.apache.org>>


Thanks, Sean! It's great that cTAKES may soon have an up to date database
out of the box. Hopefully it will cut down on the need for many to build
their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in
veterinary medicine so I need to add in the veterinary extension for
SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to
"SNOMEDCT_US". The file that I have has two lines in it. First line is
SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate
so others don't have to search for them. Not sure what it means to move
them to "location to put new UMLS DB". Does that mean move them into a new
directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which
doesn't exist. Does one need to build that somehow from the source before
running it? The command line also adds "lib/*" to the classpath. Is that
the lib directory inside the dictionarytool source code or some other
location?

What else would I need to do to include the SNOMED-CT Veterinary Extension
along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do
I'd be happy to write up an easy to follow tutorial for building a custom
dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code
itself? Like including it in the main cTAKES release? It seems there is
demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB
> version of the UMLS. They both have codes for snomedct_us, rxnorm, icd9cm
> and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis filtered
> by icd9 and icd10.
> What this means: Cuis that exist for a [filter source] are added to the
> dictionary, as are all text variations from all sources that contain that
> cui. Both dictionaries also use the standard ctakes semantic group tui
> filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx2015/
>
> The snomed rxnorm icd9 icd10:
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> The svn root for the whole ugly thing is:
> svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and ~65,000
> terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to write
> directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long ago
> when I was going through the process and Sean was helping me.
>
> 1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> "SNOMEDCT_US"
> 2. Copy ctakesumls.properties and ctakesumls.script from
> memdbtemplate to location to put new UMLS DB
> 3. Run DictionaryCreator2
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> 4. Run CodeMapCreator
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> 5. Copy new DB files to new location and create a copy of
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 12:47 PM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: Re: ctakes with icd10
>
> This seems like a pretty common request and with such an old version of
> UMLS database shipped with cTAKES it's only going to get worse. I've been
> wanting to build a dictionary using the latest UMLS release (as well as a
> custom database), so would be happy to write up the steps as I go through
> it. That assumes that I can dig up the instructions in the dev list.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>> wrote:
>
> > Hi Alaa,
> >
> > The -shortest- answer is that you'll need to run the dictionary
> > creation tool. There are instructions in older devlist threads. By
> > default the dictionary creation tool does add icd9 and icd10 tables to
> the dictionary.
> > The problem is that in Umls 2011AB those codes weren't very well
> > populated. The 2015AB icd# set is much more rich so those tables
> > should be pretty good. Then in ctakes you would look up annotations
> > by icd9 or icd10 codes instead of by cui:
> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code
> > ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
> >
> > Sean
> >
> > -----Original Message-----
> > From: Savova, Guergana [mailto:Guergana.Savova@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 12:17 PM
> > To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > Subject: RE: ctakes with icd10
> >
> > Hi Alaa,
> > You need to create a resource off the terminology/ontology you want to
> > use (in this case ICD9 or ICD10). Then run that resource with cTAKES
> > for the fast dictionary lookup. There is cTAKES code and some
> > documentation on how to create that resource. By default, cTAKES runs
> > with a resource created from the English version of SNOMED CT and RxNORM.
> > Hope this helps.
> > --Guergana
> >
> > -----Original Message-----
> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > Sent: Tuesday, December 8, 2015 10:01 AM
> > To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > Subject: ctakes with icd10
> >
> > Hi,
> >
> > I downloaded Latest umls version, and I want to know how to make
> > ctakes work with icd10 and icd9.
> >
> >
> > Thanks
> >
>
>
> IMPORTANT WARNING: The information in this message (and the documents
> attached to it, if any) is confidential and may be legally privileged. It
> is intended solely for the addressee. Access to this message by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken, or omitted to be taken, in
> reliance on it is prohibited and may be unlawful. If you have received this
> message in error, please delete all electronic copies of this message (and
> the documents attached to it, if any), destroy any hard copies you may have
> created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard
> Protected Health Information and other confidential data contained in
> external e-mail messages. If email is encrypted, the recipient will receive
> an e-mail instructing them to sign on to the Geisinger Health System Secure
> E-mail Message Center to retrieve the encrypted e-mail.
>



Re: ctakes with icd10; 2015 versions available on sourceforge!

Posted by David Kincaid <ki...@gmail.com>.
Thanks, Sean! It's great that cTAKES may soon have an up to date database
out of the box. Hopefully it will cut down on the need for many to build
their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in
veterinary medicine so I need to add in the veterinary extension for
SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to
"SNOMEDCT_US". The file that I have has two lines in it. First line is
SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate
so others don't have to search for them. Not sure what it means to move
them to "location to put new UMLS DB". Does that mean move them into a new
directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which
doesn't exist. Does one need to build that somehow from the source before
running it? The command line also adds "lib/*" to the classpath. Is that
the lib directory inside the dictionarytool source code or some other
location?

What else would I need to do to include the SNOMED-CT Veterinary Extension
along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do
I'd be happy to write up an easy to follow tutorial for building a custom
dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code
itself? Like including it in the main cTAKES release? It seems there is
demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, icd9cm
> and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis filtered
> by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to the
> dictionary, as are all text variations from all sources that contain that
> cui.  Both dictionaries also use the standard ctakes semantic group tui
> filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx2015/
>
> The snomed rxnorm icd9 icd10:
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> The svn root for the whole ugly thing is:
>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and ~65,000
> terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to write
> directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -----Original Message-----
> From: Geise, Brandon D. [mailto:bdgeise@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long ago
> when I was going through the process and Sean was helping me.
>
>         1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> "SNOMEDCT_US"
>         2. Copy ctakesumls.properties and ctakesumls.script from
> memdbtemplate to location to put new UMLS DB
>         3. Run DictionaryCreator2
>         java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         4. Run CodeMapCreator
>         java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
>         5. Copy new DB files to new location and create a copy of
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
> -----Original Message-----
> From: David Kincaid [mailto:kincaid.dave@gmail.com]
> Sent: Tuesday, December 08, 2015 12:47 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10
>
> This seems like a pretty common request and with such an old version of
> UMLS database shipped with cTAKES it's only going to get worse. I've been
> wanting to build a dictionary using the latest UMLS release (as well as a
> custom database), so would be happy to write up the steps as I go through
> it. That assumes that I can dig up the instructions in the dev list.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Alaa,
> >
> > The -shortest- answer is that you'll need to run the dictionary
> > creation tool.  There are instructions in older devlist threads.  By
> > default the dictionary creation tool does add icd9 and icd10 tables to
> the dictionary.
> > The problem is that in Umls 2011AB those codes weren't very well
> > populated.  The 2015AB icd# set is much more rich so those tables
> > should be pretty good.  Then in ctakes you would look up annotations
> > by icd9 or icd10 codes instead of by cui:
> > OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code
> > ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
> >
> > Sean
> >
> > -----Original Message-----
> > From: Savova, Guergana [mailto:Guergana.Savova@childrens.harvard.edu]
> > Sent: Tuesday, December 08, 2015 12:17 PM
> > To: dev@ctakes.apache.org
> > Subject: RE: ctakes with icd10
> >
> > Hi Alaa,
> > You need to create a resource off the terminology/ontology you want to
> > use (in this case ICD9 or ICD10). Then run that resource with cTAKES
> > for the fast dictionary lookup. There is cTAKES code and some
> > documentation on how to create that resource. By default, cTAKES runs
> > with a resource created from the English version of SNOMED CT and RxNORM.
> > Hope this helps.
> > --Guergana
> >
> > -----Original Message-----
> > From: Alaa al Barari [mailto:alaa.albarari@gmail.com]
> > Sent: Tuesday, December 8, 2015 10:01 AM
> > To: dev@ctakes.apache.org
> > Subject: ctakes with icd10
> >
> > Hi,
> >
> > I downloaded Latest umls version, and I want to know how to make
> > ctakes work with icd10 and icd9.
> >
> >
> > Thanks
> >
>
>
> IMPORTANT WARNING: The information in this message (and the documents
> attached to it, if any) is confidential and may be legally privileged. It
> is intended solely for the addressee. Access to this message by anyone else
> is unauthorized. If you are not the intended recipient, any disclosure,
> copying, distribution or any action taken, or omitted to be taken, in
> reliance on it is prohibited and may be unlawful. If you have received this
> message in error, please delete all electronic copies of this message (and
> the documents attached to it, if any), destroy any hard copies you may have
> created and notify me immediately by replying to this email. Thank you.
>
> Geisinger Health System utilizes an encryption process to safeguard
> Protected Health Information and other confidential data contained in
> external e-mail messages. If email is encrypted, the recipient will receive
> an e-mail instructing them to sign on to the Geisinger Health System Secure
> E-mail Message Center to retrieve the encrypted e-mail.
>