You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by pratik agarwal <pr...@gmail.com> on 2016/12/05 11:20:54 UTC

Fwd: Dictionary in cTAKES

Hi everyone

I came across cTAKES fairly recently and I'm facing some difficulties with
understanding the working of it. I am required to map clinical text notes
with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
ICD9CM.

I am currently trying to work with the user version of cTAKES in Intellij
IDEA with Java Oracle JDK 8.

It would be great if someone could help me out. I am really sorry if this
is too easy a problem, but I've been trying to solve it for a while and I'm
stuck.

I was able to extract ICD9CM codes from cTAKES with the default resources
i.e. ctakesnorx.properties and ctakesnorx.script

I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
.properties file from this source:

https://sourceforge.net/p/ctakesresources/code/HEAD/
tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

and made corresponding changes to the cTakesHsql.xml file as mentioned by
Sean in:

https://www.mail-archive.com/dev@ctakes.apache.org/msg02597.html

But this doesn't seem to work. I played around a bit with the parameters in
the following lines:

    <property key="snomedct_usTable" value="long"/>
        <property key="rxnormTable" value="text"/>
        <property key="icd9cmTable" value="text"/>
        <property key="icd10pcsTable" value="text"/>

Basically when I was getting blank outputs after making the change.
I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the outputs.

I was getting an error with rxnormTable. So I commented that line out. and
after that I was getting blank output. So I tried replacing value="text"
with value = "icd9cm" for key = "icd9cmTable" and it started returning
ICD9CM codes. But I couldn't get anything when I did the same with
ICD10PCS. I again got a blank output.

Note: I did all this after commenting:

        <property key="snomedTable" value="snomedct"/>
        <property key="rxnormTable" value="rxnorm"/>
        <property key="icd9Table" value="icd9cm"/>
        <property key="icd10Table" value="icd10pcs"/>


It would be great if someone could help me understand how the dictionary
mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
from this.

(i) What are the keys and values mentioned above and where can I find these
in the script or properties file? Is there a way I can access these? Please
help me understand how this is working.

(ii) I have a csv file containing the ICD codes with the code in Column 1
and description in Column 2 and similarly for CPT/HCPCS codes. What are the
steps I need to take to make it work with
OntologyConceptUtil.getSchemeCodes(JCas).


I saw from different forums that we can use dictionary-gui tool from
sandbox. But I am not really understanding which files do I need to run in
that folder. Also, where in the project tree should I place this folder to
make it run. Also, what are the parameters required and where do I change
them, if any.

Thanks a lot.

Regards,
Pratik Agarwal

Re: Dictionary in cTAKES

Posted by nishant kumar <ku...@gmail.com>.
Hi Pratik,

You can create a custom dictionary as per your requirement using the
dictionary-gui tool. This tool is a standalone Java project and you can
install and run it separately from cTAKES. The following mail from the
archives will help:

http://mail-archives.apache.org/mod_mbox/ctakes-dev/201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=jK3pzZ7WGA@mail.gmail.com%3E

Once you have the tool in your workspace, run CreatorGui.java, this will
bring up a gui from where you can select the required vocabularies. After
the build is complete, the tool will create new script and xml files that
you can plug into your cTAKES installation. Also, before you can run the
dictionary-gui tool, you need to have a local installation of the UMLS.

thanks
Nishant



On Mon, Dec 5, 2016 at 6:20 AM, pratik agarwal <pr...@gmail.com>
wrote:

> Hi everyone
>
> I came across cTAKES fairly recently and I'm facing some difficulties with
> understanding the working of it. I am required to map clinical text notes
> with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
> default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
> ICD9CM.
>
> I am currently trying to work with the user version of cTAKES in Intellij
> IDEA with Java Oracle JDK 8.
>
> It would be great if someone could help me out. I am really sorry if this
> is too easy a problem, but I've been trying to solve it for a while and I'm
> stuck.
>
> I was able to extract ICD9CM codes from cTAKES with the default resources
> i.e. ctakesnorx.properties and ctakesnorx.script
>
> I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
> .properties file from this source:
>
> https://sourceforge.net/p/ctakesresources/code/HEAD/tree/
> trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/
> resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> and made corresponding changes to the cTakesHsql.xml file as mentioned by
> Sean in:
>
> https://www.mail-archive.com/dev@ctakes.apache.org/msg02597.html
>
> But this doesn't seem to work. I played around a bit with the parameters
> in the following lines:
>
>     <property key="snomedct_usTable" value="long"/>
>         <property key="rxnormTable" value="text"/>
>         <property key="icd9cmTable" value="text"/>
>         <property key="icd10pcsTable" value="text"/>
>
> Basically when I was getting blank outputs after making the change.
> I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the
> outputs.
>
> I was getting an error with rxnormTable. So I commented that line out. and
> after that I was getting blank output. So I tried replacing value="text"
> with value = "icd9cm" for key = "icd9cmTable" and it started returning
> ICD9CM codes. But I couldn't get anything when I did the same with
> ICD10PCS. I again got a blank output.
>
> Note: I did all this after commenting:
>
>         <property key="snomedTable" value="snomedct"/>
>         <property key="rxnormTable" value="rxnorm"/>
>         <property key="icd9Table" value="icd9cm"/>
>         <property key="icd10Table" value="icd10pcs"/>
>
>
> It would be great if someone could help me understand how the dictionary
> mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
> from this.
>
> (i) What are the keys and values mentioned above and where can I find
> these in the script or properties file? Is there a way I can access these?
> Please help me understand how this is working.
>
> (ii) I have a csv file containing the ICD codes with the code in Column 1
> and description in Column 2 and similarly for CPT/HCPCS codes. What are the
> steps I need to take to make it work with OntologyConceptUtil.getSchemeCodes(JCas).
>
>
> I saw from different forums that we can use dictionary-gui tool from
> sandbox. But I am not really understanding which files do I need to run in
> that folder. Also, where in the project tree should I place this folder to
> make it run. Also, what are the parameters required and where do I change
> them, if any.
>
> Thanks a lot.
>
> Regards,
> Pratik Agarwal
>
>

Re: Dictionary in cTAKES

Posted by pratik agarwal <pr...@gmail.com>.
Thanks Sean for the reply. You were right. The cTakesHsql.xml was being
used because I was calling the getFastPipeline method, which uses
*AnalysisEngineFactory.**createEngineDescription() *where the file resource
is by default mentioned to be cTakesHsql.xml. Now, to respond to how am I
using cTakes:

1. I am using cTAKES in IntelliJ IDEA. I downloaded the user version and
added the compiled binaries to my classpath. The *OntologyConceptUtil.java*
was not in there, so I separately downloaded it from trunk and added that
too to my classpath.

2. I wrote a simple java code that calls the *getFastPipeline()* method
from the *ClinicalPipelineFactory *class.  and then use OntologyConceptUtil.
getSchemeCodes(*Identified Annotation object) *to get the codes in a
document. So, when I was using it with the original cTakesHsql.xml with
ctakessnorx.script and ctakessnorx.properties as the sources used, I was
getting the codes like :

Entity: coenzyme Q10=== codes:* {RXNORM=[21406], SNOMEDCT=[412129003,
412130008]}*

I was also getting the ICD-9 codes, like:


*wound opening=== codes: {ICD9CM=[870-897.99], SNOMEDCT=[269362000,
59091005, 125643001, 157351009, 157439006, 269347002]}*

but when I modified the cTakesHsql.xml as you mentioned here:
https://www.mail-archive.com/dev@ctakes.apache.org/msg02597.html

Note: I did change
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx"/>

to
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ctakesicd2015"/>


I got:
*wound opening=== codes:{}*


So I tried changing <property key="*icd9cm_2014Table*"
value="*text*"/> to <property
key="*icd9cm_2014Table*" value="*ic**d9cm*"/> and I got:

*wound opening=== codes:{**ICD9CM=[870-897.99]**}.* But no success with
ICD10CM or RXNORM.

Then I followed the process that was mentioned in the document you sent. I
managed to get the .script, .properties, .rc file and the .xml file as
expected. But even with that xml file, I'm simply getting empty values
printed:

*wound opening=== codes:{}*

It would be really great if you can help me understand what I might be
doing wrong.

Thanks and Best Regards.
Pratik Agarwal



On Fri, Dec 16, 2016 at 7:57 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Pratik,
>
>
>
> How are you running ctakes?  If you are running it using the older uima
> style then editing the descriptor files (*Annotator.xml) as you have done
> should work.  If you are running it with a UimaFit class or a piper file
> then you will need to redirect to your custom dictionary config .xml in
> another manner.  The pains of progress …
>
>
>
> 1.        Let me know how you launch ctakes.
>
> a.       If you are launching by directly running a class then you will
> need to override a default parameter functionally.  Edit your call
>
> “AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class );”
>
> And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the
> call right after “.class”.  Note the comma.
>
> b.      If you are running with the DefaultFastPipeline.piper file (used
> by bin/runClinicalPipeline) then you can edit the piper file and add a line
> with “addParameters DictionaryDescriptor=[my].xml”.  Add it above the
> line “add DefaultJCasTermAnnotator”.  If you updated trunk within the last
> few days you can use “set” in place of “addParameters”.  The
> DefaultFastPipeline.piper is in resources/org/apache/ctakes/
> clinical/pipeline/piper/
>
> 2.       “key” and “value” indicate the name of a vocabulary table in the
> dictionary database and the datatype of the code values within that table.
> It looks like all of your snomed and rxnorms were able to be stored as
> “long”, but the other vocabularies had at least one character or two
> decimals so they required “text”.
>
> a.       All of the table names in the database are those listed as keys
> but without the “Table” suffix.  For instance, yours are
> “snomedct_us_2016_09_01” , “rxnorm_16aa_160906f” and so forth.
>
> b.      You don’t need to change any named contants (*CODING_CHEME=*) in
> the code to fetch your data.
>
> If you are getting codes in the CPE then they should be available under
> OntologyConceptArray.
>
> If you are getting codes programmatically then use the class
> OntologyConceptUtil.  It has a tonne of methods that can be used to obtain
> codes, for the entire document, for certain sections, for individual
> annotations, etc.
>
>
>
> I hope that the above is clear.  I will try to add all of this to some
> documentation asap and make it available publicly.  Don’t anybody hold your
> breath though …
>
>
>
> Sean
>
>
>
>
>
> TODO SPF
>
>
>
> *From:* pratik agarwal [mailto:pratikagarwal2203@gmail.com]
> *Sent:* Friday, December 16, 2016 3:40 AM
> *To:* Finan, Sean
> *Cc:* dev@ctakes.apache.org
> *Subject:* Re: Dictionary in cTAKES
>
>
>
> Thanks Sean and Nishant for the help. Sean, the document you sent was
> really helpful. I was able to successfully create a dictionary using the
> dictionary-gui. But I'm still not able to use the dictionary. It would be
> great if you could help me out.
>
>
>
> I got a .script file, a .properties file, a .rc file and a .xml file on
> running the dictionary-gui as Sean mentioned here:
>
> http://mail-archives.apache.org/mod_mbox/ctakes-dev/
> 201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=
> jK3pzZ7WGA@mail.gmail.com%3E
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=>
>
>
>
> Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and *UmlsOverlapLookupAnnotator.xml
> *from cTakesHsql.xml to [new xml file name].xml  in the directory [cTAKES
> root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:
>
>
>
> Here's the part where I changed it:
>
>             *<name>**DictionaryDescriptorFile**</name>
>
>   *
>
> *            <description/>
>
>                    *
>
> *            <fileResourceSpecifier>
>
>             *
>
> *                  <fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new
> xml file name].xml**</fileUrl>              *
>
> *            </fileResourceSpecifier>
>
>              *
>
> *            <implementationName>*
> *org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> *
>
>
>
> But when I run the program, in the line describing the dictionary resource
> used, I see that the cTakesHsql.xml is still being used instead of the new
> one. Here is what it looks like:
>
>
>
> *INFO DictionaryDescriptorParser - Parsing dictionary specifications:
> /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml*
>
>
>
> Another issue I'm facing is, even when I simply replace the contents of
> cTakesHsql.xml with the contents of the new xml file, it's not returning
> any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was
> returning a few codes. I have a feeling this has to do with the *keys *and*
> values* in:
>
>
>
>             <property key="*snomedct_us_2016_09_01Table*" value="*long*"/>
>
>             <property key="*rxnorm_16aa_160906fTable*" value="*long*"/>
>
>             <property key="*icd10pcs_2017Table*" value="*text*"/>
>
>             <property key="*icd10cm_2017Table*" value="*text*"/>
>
>             <property key="*icd9cm_2014Table*" value="*text*"/>
>
>
>
> Can you please guide me on the following 2 questions:
>
>
>
> 1. Where do I need to change the resource xml file location, to make
> cTAKES use my custom dictionary instead of the default one.
>
>
>
> 2. What do the *key *and *value *above actually correspond to? Do I need
> to make any changes to it? I saw a lot of class files that contain terms
> like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in
> those files too?
>
> For example, in "IdentifiedAnnotation.class", I can see lines like:
>
>
>
> private static final Logger LOGGER = Logger.getLogger("IdentifiedAnnotationUtil");
>
> public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";
>
> public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";
>
>
>
>
>
> Thanks and Best Regards
>
> Pratik Agarwal
>
>
>
>
>
> On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard.
> edu> wrote:
>
> Hi Pratik,
>
> It sounds like you are running using code from trunk.  That is good.
>
> I have attached a document that outlines how you can use a dictionary
> creator gui to make a database with any umls source vocabulary that you
> need.  That would be section 6.1.  It also outlines use of a bsv (similar
> to csv) file in section 6.2.
>
> Please provide questions and feedback as I would like to improve this
> document before making it public on the ctakes website.
>
> Some brief information on how the fast dictionary lookup works can be
> viewed here: https://cwiki.apache.org/confluence/display/CTAKES/
> cTAKES+3.2+-+Fast+Dictionary+Lookup
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=>
>
> Sean
>
>
>
>
>
> -----Original Message-----
> From: pratik agarwal [mailto:pratikagarwal2203@gmail.com]
> Sent: Monday, December 05, 2016 6:21 AM
> To: user@ctakes.apache.org
> Subject: Fwd: Dictionary in cTAKES
>
> Hi everyone
>
> I came across cTAKES fairly recently and I'm facing some difficulties with
> understanding the working of it. I am required to map clinical text notes
> with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
> default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
> ICD9CM.
>
> I am currently trying to work with the user version of cTAKES in Intellij
> IDEA with Java Oracle JDK 8.
>
> It would be great if someone could help me out. I am really sorry if this
> is too easy a problem, but I've been trying to solve it for a while and I'm
> stuck.
>
> I was able to extract ICD9CM codes from cTAKES with the default resources
> i.e. ctakesnorx.properties and ctakesnorx.script
>
> I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
> .properties file from this source:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_
> 3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=
> bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=
> MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
> tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/>
> src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> and made corresponding changes to the cTakesHsql.xml file as mentioned by
> Sean in:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.
> mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c=
> qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=
> bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=
> X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=
>
>
> But this doesn't seem to work. I played around a bit with the parameters
> in the following lines:
>
>     <property key="snomedct_usTable" value="long"/>
>         <property key="rxnormTable" value="text"/>
>         <property key="icd9cmTable" value="text"/>
>         <property key="icd10pcsTable" value="text"/>
>
> Basically when I was getting blank outputs after making the change.
> I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the
> outputs.
>
> I was getting an error with rxnormTable. So I commented that line out. and
> after that I was getting blank output. So I tried replacing value="text"
> with value = "icd9cm" for key = "icd9cmTable" and it started returning
> ICD9CM codes. But I couldn't get anything when I did the same with
> ICD10PCS. I again got a blank output.
>
> Note: I did all this after commenting:
>
>         <property key="snomedTable" value="snomedct"/>
>         <property key="rxnormTable" value="rxnorm"/>
>         <property key="icd9Table" value="icd9cm"/>
>         <property key="icd10Table" value="icd10pcs"/>
>
>
> It would be great if someone could help me understand how the dictionary
> mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
> from this.
>
> (i) What are the keys and values mentioned above and where can I find
> these in the script or properties file? Is there a way I can access these?
> Please help me understand how this is working.
>
> (ii) I have a csv file containing the ICD codes with the code in Column 1
> and description in Column 2 and similarly for CPT/HCPCS codes. What are the
> steps I need to take to make it work with OntologyConceptUtil.
> getSchemeCodes(JCas).
>
>
> I saw from different forums that we can use dictionary-gui tool from
> sandbox. But I am not really understanding which files do I need to run in
> that folder. Also, where in the project tree should I place this folder to
> make it run. Also, what are the parameters required and where do I change
> them, if any.
>
> Thanks a lot.
>
> Regards,
> Pratik Agarwal
>
>
>

RE: Dictionary in cTAKES

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Pratik,

How are you running ctakes?  If you are running it using the older uima style then editing the descriptor files (*Annotator.xml) as you have done should work.  If you are running it with a UimaFit class or a piper file then you will need to redirect to your custom dictionary config .xml in another manner.  The pains of progress …


1.        Let me know how you launch ctakes.

a.       If you are launching by directly running a class then you will need to override a default parameter functionally.  Edit your call

“AnalysisEngineFactory.createEngineDescription( DefaultJCasTermAnnotator.class );”

And add “,JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, [my].xml” to the call right after “.class”.  Note the comma.

b.      If you are running with the DefaultFastPipeline.piper file (used by bin/runClinicalPipeline) then you can edit the piper file and add a line with “addParameters DictionaryDescriptor=[my].xml”.  Add it above the line “add DefaultJCasTermAnnotator”.  If you updated trunk within the last few days you can use “set” in place of “addParameters”.  The DefaultFastPipeline.piper is in resources/org/apache/ctakes/clinical/pipeline/piper/

2.       “key” and “value” indicate the name of a vocabulary table in the dictionary database and the datatype of the code values within that table.  It looks like all of your snomed and rxnorms were able to be stored as “long”, but the other vocabularies had at least one character or two decimals so they required “text”.

a.       All of the table names in the database are those listed as keys but without the “Table” suffix.  For instance, yours are “snomedct_us_2016_09_01” , “rxnorm_16aa_160906f” and so forth.

b.      You don’t need to change any named contants (*CODING_CHEME=*) in the code to fetch your data.

If you are getting codes in the CPE then they should be available under OntologyConceptArray.

If you are getting codes programmatically then use the class OntologyConceptUtil.  It has a tonne of methods that can be used to obtain codes, for the entire document, for certain sections, for individual annotations, etc.

I hope that the above is clear.  I will try to add all of this to some documentation asap and make it available publicly.  Don’t anybody hold your breath though …

Sean


TODO SPF

From: pratik agarwal [mailto:pratikagarwal2203@gmail.com]
Sent: Friday, December 16, 2016 3:40 AM
To: Finan, Sean
Cc: dev@ctakes.apache.org
Subject: Re: Dictionary in cTAKES

Thanks Sean and Nishant for the help. Sean, the document you sent was really helpful. I was able to successfully create a dictionary using the dictionary-gui. But I'm still not able to use the dictionary. It would be great if you could help me out.

I got a .script file, a .properties file, a .rc file and a .xml file on running the dictionary-gui as Sean mentioned here:
http://mail-archives.apache.org/mod_mbox/ctakes-dev/201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=jK3pzZ7WGA@mail.gmail.com%3E<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201601.mbox_-253CCA-2BjqmuyBcv-2Dh67bxg-3DgummpVkE-5FkhOXpSfRvSqx-3DjK3pzZ7WGA-40mail.gmail.com-253E&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=WnxjzqXpPeLEVlgbJ2qea8Z1rdQjih37ci-zwB9rIE4&e=>

Then I changed the file url in both UmlsLookupAnnotator.xml and UmlsOverlapLookupAnnotator.xml from cTakesHsql.xml to [new xml file name].xml  in the directory [cTAKES root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:

Here's the part where I changed it:
            <name>DictionaryDescriptorFile</name>
            <description/>
            <fileResourceSpecifier>
                  <fileUrl>file:org/apache/ctakes/dictionary/lookup/fast/[new xml file name].xml</fileUrl>
            </fileResourceSpecifier>
            <implementationName>org.apache.ctakes.core.resource.FileResourceImpl</implementationName>

But when I run the program, in the line describing the dictionary resource used, I see that the cTakesHsql.xml is still being used instead of the new one. Here is what it looks like:

INFO DictionaryDescriptorParser - Parsing dictionary specifications: /home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml

Another issue I'm facing is, even when I simply replace the contents of cTakesHsql.xml with the contents of the new xml file, it's not returning any codes (ICD,RXNORM etc.), although the original cTakesHsql.xml was returning a few codes. I have a feeling this has to do with the keys and values in:

            <property key="snomedct_us_2016_09_01Table" value="long"/>
            <property key="rxnorm_16aa_160906fTable" value="long"/>
            <property key="icd10pcs_2017Table" value="text"/>
            <property key="icd10cm_2017Table" value="text"/>
            <property key="icd9cm_2014Table" value="text"/>

Can you please guide me on the following 2 questions:

1. Where do I need to change the resource xml file location, to make cTAKES use my custom dictionary instead of the default one.

2. What do the key and value above actually correspond to? Do I need to make any changes to it? I saw a lot of class files that contain terms like "RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those files too?
For example, in "IdentifiedAnnotation.class", I can see lines like:


private static final Logger LOGGER = Logger.getLogger("IdentifiedAnnotationUtil");

public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";
public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";


Thanks and Best Regards
Pratik Agarwal


On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Se...@childrens.harvard.edu>> wrote:
Hi Pratik,

It sounds like you are running using code from trunk.  That is good.

I have attached a document that outlines how you can use a dictionary creator gui to make a database with any umls source vocabulary that you need.  That would be section 6.1.  It also outlines use of a bsv (similar to csv) file in section 6.2.

Please provide questions and feedback as I would like to improve this document before making it public on the ctakes website.

Some brief information on how the fast dictionary lookup works can be viewed here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B3.2-2B-2D-2BFast-2BDictionary-2BLookup&d=DgMFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=p70NxcbO486VuKQBsHbtqJgTuufOpYLf7I2B_6sJMY0&s=YLf08s5AAkmYnEHjwcs47pSnYq41MZMfFmuBewtqnzQ&e=>

Sean





-----Original Message-----
From: pratik agarwal [mailto:pratikagarwal2203@gmail.com<ma...@gmail.com>]
Sent: Monday, December 05, 2016 6:21 AM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Fwd: Dictionary in cTAKES

Hi everyone

I came across cTAKES fairly recently and I'm facing some difficulties with understanding the working of it. I am required to map clinical text notes with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and ICD9CM.

I am currently trying to work with the user version of cTAKES in Intellij IDEA with Java Oracle JDK 8.

It would be great if someone could help me out. I am really sorry if this is too easy a problem, but I've been trying to solve it for a while and I'm stuck.

I was able to extract ICD9CM codes from cTAKES with the default resources i.e. ctakesnorx.properties and ctakesnorx.script

I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and .properties file from this source:

https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

and made corresponding changes to the cTakesHsql.xml file as mentioned by Sean in:

https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=

But this doesn't seem to work. I played around a bit with the parameters in the following lines:

    <property key="snomedct_usTable" value="long"/>
        <property key="rxnormTable" value="text"/>
        <property key="icd9cmTable" value="text"/>
        <property key="icd10pcsTable" value="text"/>

Basically when I was getting blank outputs after making the change.
I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the outputs.

I was getting an error with rxnormTable. So I commented that line out. and after that I was getting blank output. So I tried replacing value="text"
with value = "icd9cm" for key = "icd9cmTable" and it started returning ICD9CM codes. But I couldn't get anything when I did the same with ICD10PCS. I again got a blank output.

Note: I did all this after commenting:

        <property key="snomedTable" value="snomedct"/>
        <property key="rxnormTable" value="rxnorm"/>
        <property key="icd9Table" value="icd9cm"/>
        <property key="icd10Table" value="icd10pcs"/>


It would be great if someone could help me understand how the dictionary mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes from this.

(i) What are the keys and values mentioned above and where can I find these in the script or properties file? Is there a way I can access these? Please help me understand how this is working.

(ii) I have a csv file containing the ICD codes with the code in Column 1 and description in Column 2 and similarly for CPT/HCPCS codes. What are the steps I need to take to make it work with OntologyConceptUtil.getSchemeCodes(JCas).


I saw from different forums that we can use dictionary-gui tool from sandbox. But I am not really understanding which files do I need to run in that folder. Also, where in the project tree should I place this folder to make it run. Also, what are the parameters required and where do I change them, if any.

Thanks a lot.

Regards,
Pratik Agarwal


Re: Dictionary in cTAKES

Posted by pratik agarwal <pr...@gmail.com>.
Thanks Sean and Nishant for the help. Sean, the document you sent was
really helpful. I was able to successfully create a dictionary using the
dictionary-gui. But I'm still not able to use the dictionary. It would be
great if you could help me out.

I got a .script file, a .properties file, a .rc file and a .xml file on
running the dictionary-gui as Sean mentioned here:
http://mail-archives.apache.org/mod_mbox/ctakes-dev/
201601.mbox/%3CCA+jqmuyBcv-h67bxg=gummpVkE_khOXpSfRvSqx=
jK3pzZ7WGA@mail.gmail.com%3E

Then I changed the *file url* in both *UmlsLookupAnnotator.xml* and
*UmlsOverlapLookupAnnotator.xml
*from cTakesHsql.xml to [new xml file name].xml  in the directory [cTAKES
root]/desc/ctakes-dictionary-lookup-fast/desc/analysis_engine/:

Here's the part where I changed it:
            *<name>**DictionaryDescriptorFile**</name>
                                                                          *
*            <description/>

                 *
*            <fileResourceSpecifier>

            *
*
<fileUrl>**file:org/apache/ctakes/dictionary/lookup/fast/[new
xml file name].xml**</fileUrl>              *
*            </fileResourceSpecifier>

           *
*            <implementationName>*
*org.apache.ctakes.core.resource.FileResourceImpl**</implementationName> *

But when I run the program, in the line describing the dictionary resource
used, I see that the cTakesHsql.xml is still being used instead of the new
one. Here is what it looks like:

*INFO DictionaryDescriptorParser - Parsing dictionary specifications:
/home/pratik/Desktop/cTAKES/out/production/cTAKES/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml*

Another issue I'm facing is, even when I simply replace the contents of
cTakesHsql.xml with the contents of the new xml file, it's not returning
any codes (*ICD,RXNORM etc.*), although the original cTakesHsql.xml was
returning a few codes. I have a feeling this has to do with the *keys *and*
values* in:

            <property key="*snomedct_us_2016_09_01Table*" value="*long*"/>
            <property key="*rxnorm_16aa_160906fTable*" value="*long*"/>
            <property key="*icd10pcs_2017Table*" value="*text*"/>
            <property key="*icd10cm_2017Table*" value="*text*"/>
            <property key="*icd9cm_2014Table*" value="*text*"/>

Can you please guide me on the following 2 questions:

1. Where do I need to change the resource xml file location, to make cTAKES
use my custom dictionary instead of the default one.

2. What do the *key *and *value *above actually correspond to? Do I need to
make any changes to it? I saw a lot of class files that contain terms like
"RXNORM", "SNOMEDCT", "ICD9CM" etc. Do I need to make any changes in those
files too?
For example, in "IdentifiedAnnotation.class", I can see lines like:

private static final Logger LOGGER =
Logger.getLogger("IdentifiedAnnotationUtil");

public static final String CTAKES_SNOMED_CODING_SCHEME = "SNOMED";

public static final String CTAKES_RXNORM_CODING_SCHEME = "RXNORM";


Thanks and Best Regards
Pratik Agarwal


On Tue, Dec 6, 2016 at 8:19 PM, Finan, Sean <Sean.Finan@childrens.harvard.
edu> wrote:

> Hi Pratik,
>
> It sounds like you are running using code from trunk.  That is good.
>
> I have attached a document that outlines how you can use a dictionary
> creator gui to make a database with any umls source vocabulary that you
> need.  That would be section 6.1.  It also outlines use of a bsv (similar
> to csv) file in section 6.2.
>
> Please provide questions and feedback as I would like to improve this
> document before making it public on the ctakes website.
>
> Some brief information on how the fast dictionary lookup works can be
> viewed here: https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.
> 2+-+Fast+Dictionary+Lookup
>
> Sean
>
>
>
>
>
> -----Original Message-----
> From: pratik agarwal [mailto:pratikagarwal2203@gmail.com]
> Sent: Monday, December 05, 2016 6:21 AM
> To: user@ctakes.apache.org
> Subject: Fwd: Dictionary in cTAKES
>
> Hi everyone
>
> I came across cTAKES fairly recently and I'm facing some difficulties with
> understanding the working of it. I am required to map clinical text notes
> with the ICD-10-CM and CPT/HCPCS codes. From what I read, or tried, the
> default dictionaries used with the fast pipeline are SNOMEDCT, RXNORM and
> ICD9CM.
>
> I am currently trying to work with the user version of cTAKES in Intellij
> IDEA with Java Oracle JDK 8.
>
> It would be great if someone could help me out. I am really sorry if this
> is too easy a problem, but I've been trying to solve it for a while and I'm
> stuck.
>
> I was able to extract ICD9CM codes from cTAKES with the default resources
> i.e. ctakesnorx.properties and ctakesnorx.script
>
> I wanted to get ICD10CM and ICD10PCS codes, so I downloaded .script and
> .properties file from this source:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcef
> orge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=
> qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpy
> IisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvj
> slf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=
> tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_ctakesresources_code_HEAD_&d=DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAHAYAn6pAvjslf1QfJGV0SxFK4&s=MPTFgN4f0bdBiw3lmHgNeGg19MTkQUVjdMDxT0DDFYA&e=tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/>
> src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> and made corresponding changes to the cTakesHsql.xml file as mentioned by
> Sean in:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mai
> l-2Darchive.com_dev-40ctakes.apache.org_msg02597.html&d=
> DgIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67G
> vlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=bi6ZrIdoDcJEj2PmWAH
> AYAn6pAvjslf1QfJGV0SxFK4&s=X84zAxr2pdhp4sOWuOQ1wfpEGCLtD9s16dB7DaTghc0&e=
>
> But this doesn't seem to work. I played around a bit with the parameters
> in the following lines:
>
>     <property key="snomedct_usTable" value="long"/>
>         <property key="rxnormTable" value="text"/>
>         <property key="icd9cmTable" value="text"/>
>         <property key="icd10pcsTable" value="text"/>
>
> Basically when I was getting blank outputs after making the change.
> I am using OntologyConceptUtil.getSchemeCodes(JCas) for getting the
> outputs.
>
> I was getting an error with rxnormTable. So I commented that line out. and
> after that I was getting blank output. So I tried replacing value="text"
> with value = "icd9cm" for key = "icd9cmTable" and it started returning
> ICD9CM codes. But I couldn't get anything when I did the same with
> ICD10PCS. I again got a blank output.
>
> Note: I did all this after commenting:
>
>         <property key="snomedTable" value="snomedct"/>
>         <property key="rxnormTable" value="rxnorm"/>
>         <property key="icd9Table" value="icd9cm"/>
>         <property key="icd10Table" value="icd10pcs"/>
>
>
> It would be great if someone could help me understand how the dictionary
> mechanism is working. Also, how to get ICD10CM codes and ICD10PCS codes
> from this.
>
> (i) What are the keys and values mentioned above and where can I find
> these in the script or properties file? Is there a way I can access these?
> Please help me understand how this is working.
>
> (ii) I have a csv file containing the ICD codes with the code in Column 1
> and description in Column 2 and similarly for CPT/HCPCS codes. What are the
> steps I need to take to make it work with OntologyConceptUtil.getSchemeC
> odes(JCas).
>
>
> I saw from different forums that we can use dictionary-gui tool from
> sandbox. But I am not really understanding which files do I need to run in
> that folder. Also, where in the project tree should I place this folder to
> make it run. Also, what are the parameters required and where do I change
> them, if any.
>
> Thanks a lot.
>
> Regards,
> Pratik Agarwal
>