You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by Alan Simmons <al...@tempus.com> on 2016/12/27 22:06:11 UTC

expanding cTAKES to use concepts from vocabularies other than SNOMED and RXNorm

Hi. I've been working with cTAKES for a few weeks now. I'm running the
standard CPE from the command line and generating CAS files that include
SNOMED and RxNorm concepts.

I'd like to expand my annotation to include concepts from vocabularies
other than SNOMED and RxNORM--specifically, terms from the NCI Thesaurus
for cancer-specific terms that are not in SNOMED--e.g., "Stage IB non-small
cell lung cancer" (UMLS CUI C1336139). What's the best way to accomplish
this?

Regards,

Alan Simmons

-- 
J. Alan Simmons
Solution Architect

(c) +1.773.220.5018

-- 
This email and any attachments may contain privileged and confidential 
information and/or protected health information (PHI) that is protected by 
federal and state privacy laws.  It is intended solely for the use of 
Tempus Labs and the recipient(s) named above.  Nothing contained in this 
communication and any attachments thereto is intended to waive any 
privileges or rights of confidentiality.  If you are not the recipient, or 
the employee or agent responsible for delivering this message to the 
intended recipient, you are hereby notified that any review, dissemination, 
distribution, printing or copying of this email message and/or any 
attachments is strictly prohibited. * If you have received this 
transmission in error, please notify us immediately at **(877)-654-5544** and 
permanently delete this email and any attachments*.

Re: expanding cTAKES to use concepts from vocabularies other than SNOMED and RXNorm

Posted by Alan Simmons <al...@tempus.com>.
Thanks, Guergana and Sean.

We've been using the dictionary tool to build an updated UMLS dictionary,
but have only been seeing SNOMED and RxNorm concepts in our output. After
receiving your message, we reviewed the dictionary tool.

It seems to us that to add a dictionary other than the defaults (SNOMED,
RxNorm, and ICD), we would need to make significant changes, including some
hard coding in a Java class. Before we go that route, we thought that we'd
ask for a sanity check.

It appears that we would need to:

   - Include new vocabularies in the dictionarytool's
   ConversionSources.txt--making it look more like the "optional" version
   instead of the "default" one (i.e.,
   https://svn.apache.org/repos/asf/ctakes/sandbox/dictionary-gui/data/default/ConversionSources.txt).
   Easy enough.
   - Add custom property keys for the desired dictionaries to the
   cTakesHsql.xml file. The default file currently has keys for SNOMED,
   RxNorm, ICD-9, and ICD-10. Also straightforward.
   - Update the code in the class
   org.apache.ctakes.dictionary.lookup2.concept.JdbcConceptFactory. This class
   seems to be hard-coded to look for the SNOMED, RxNorm, etc. tags in
   cTakesHsql.xml (e.g. <property key="snomedTable" value="snomedct"/>. Then
   recompile the class. This is something that we'd rather avoid, of course.

Is that all that we would need to do? Is there a simpler way?

Regards,

Alan

On Tue, Dec 27, 2016 at 7:31 PM, Savova, Guergana <
Guergana.Savova@childrens.harvard.edu> wrote:

> Hi Alan,
>
> There is a module for building a dictionary off any vocabulary. It was
> Sean Finan who wrote the code. Sean is out until Jan 3, I am sure he will
> get back to you when he comes back from the holidays. From what I remember,
> the code is straightforward to use.
>
> Happy Holidays!
>
> --Guergana
>
>
>
> Guergana Savova, PhD, FACMI
>
> Associate Professor
>
> PI Natural Language Processing Lab
>
> Boston Children's Hospital and Harvard Medical School
>
> 300 Longwood Avenue
>
> Mailstop: BCH3092
>
> Enders 144.1
>
> Boston, MA 02115
>
> Tel: (617) 919-2972
>
> Fax: (617) 730-0817
>
> Guergana.Savova@childrens.harvard.edu
>
> Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
>
> ctakes.apache.org
>
> thyme.healthnlp.org
>
> cancer.healthnlp.org
>
> share.healthnlp.org
>
>
>
>
>
> *From:* Alan Simmons [mailto:alan.simmons@tempus.com]
> *Sent:* Tuesday, December 27, 2016 5:06 PM
> *To:* user@ctakes.apache.org
> *Subject:* expanding cTAKES to use concepts from vocabularies other than
> SNOMED and RXNorm
>
>
>
> Hi. I've been working with cTAKES for a few weeks now. I'm running the
> standard CPE from the command line and generating CAS files that include
> SNOMED and RxNorm concepts.
>
> I'd like to expand my annotation to include concepts from vocabularies
> other than SNOMED and RxNORM--specifically, terms from the NCI Thesaurus
> for cancer-specific terms that are not in SNOMED--e.g., "Stage IB non-small
> cell lung cancer" (UMLS CUI C1336139). What's the best way to accomplish
> this?
>
> Regards,
>
> Alan Simmons
>
> --
>
> J. Alan Simmons
>
> Solution Architect
>
>
> (c) +1.773.220.5018
>
>
> This email and any attachments may contain privileged and confidential
> information and/or protected health information (PHI) that is protected by
> federal and state privacy laws.  It is intended solely for the use of
> Tempus Labs and the recipient(s) named above.  Nothing contained in this
> communication and any attachments thereto is intended to waive any
> privileges or rights of confidentiality.  If you are not the recipient, or
> the employee or agent responsible for delivering this message to the
> intended recipient, you are hereby notified that any review, dissemination,
> distribution, printing or copying of this email message and/or any
> attachments is strictly prohibited. * If you have received this
> transmission in error, please notify us immediately at **(877)-654-5544
> <%28877%29%20654-5544>** and permanently delete this email and any
> attachments*.
>



-- 
J. Alan Simmons
Solution Architect

(c) +1.773.220.5018

-- 
This email and any attachments may contain privileged and confidential 
information and/or protected health information (PHI) that is protected by 
federal and state privacy laws.  It is intended solely for the use of 
Tempus Labs and the recipient(s) named above.  Nothing contained in this 
communication and any attachments thereto is intended to waive any 
privileges or rights of confidentiality.  If you are not the recipient, or 
the employee or agent responsible for delivering this message to the 
intended recipient, you are hereby notified that any review, dissemination, 
distribution, printing or copying of this email message and/or any 
attachments is strictly prohibited. * If you have received this 
transmission in error, please notify us immediately at **(877)-654-5544** and 
permanently delete this email and any attachments*.

RE: expanding cTAKES to use concepts from vocabularies other than SNOMED and RXNorm

Posted by "Savova, Guergana" <Gu...@childrens.harvard.edu>.
Hi Alan,
There is a module for building a dictionary off any vocabulary. It was Sean Finan who wrote the code. Sean is out until Jan 3, I am sure he will get back to you when he comes back from the holidays. From what I remember, the code is straightforward to use.
Happy Holidays!
--Guergana

Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Boston Children's Hospital and Harvard Medical School
300 Longwood Avenue
Mailstop: BCH3092
Enders 144.1
Boston, MA 02115
Tel: (617) 919-2972
Fax: (617) 730-0817
Guergana.Savova@childrens.harvard.edu<ma...@childrens.harvard.edu>
Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv
ctakes.apache.org
thyme.healthnlp.org
cancer.healthnlp.org
share.healthnlp.org


From: Alan Simmons [mailto:alan.simmons@tempus.com]
Sent: Tuesday, December 27, 2016 5:06 PM
To: user@ctakes.apache.org
Subject: expanding cTAKES to use concepts from vocabularies other than SNOMED and RXNorm

Hi. I've been working with cTAKES for a few weeks now. I'm running the standard CPE from the command line and generating CAS files that include SNOMED and RxNorm concepts.
I'd like to expand my annotation to include concepts from vocabularies other than SNOMED and RxNORM--specifically, terms from the NCI Thesaurus for cancer-specific terms that are not in SNOMED--e.g., "Stage IB non-small cell lung cancer" (UMLS CUI C1336139). What's the best way to accomplish this?
Regards,

Alan Simmons

--
J. Alan Simmons
Solution Architect

(c) +1.773.220.5018<tel:%2B1.773.220.5018>

[https://docs.google.com/uc?export=download&id=0B_ZiIlRgT_0DRHBNQ20zYkNvYjg&revid=0B_ZiIlRgT_0DUDFod1hqNWFCbjdCcGRNZ2Q4d3RhaFF6bHZJPQ]

This email and any attachments may contain privileged and confidential information and/or protected health information (PHI) that is protected by federal and state privacy laws.  It is intended solely for the use of Tempus Labs and the recipient(s) named above.  Nothing contained in this communication and any attachments thereto is intended to waive any privileges or rights of confidentiality.  If you are not the recipient, or the employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any review, dissemination, distribution, printing or copying of this email message and/or any attachments is strictly prohibited.  If you have received this transmission in error, please notify us immediately at (877)-654-5544 and permanently delete this email and any attachments.