You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by Stuti Awasthi <st...@hcl.com> on 2016/06/13 10:53:42 UTC

Integrate Custom Dictionary in cTakes.

Hi All,
Im using cTakes 3.2 and would like to use custom dictionary in place of UMLS to run few trials. Now in documentation I got the information that our new dictionary needs to be in BSV or hsql format but didn't got more details on the same.
I need some help to

*         convert my custom dictionary to bsv format (| separated). What all

*         How to integrate the new custom dictionary to cTakes. Which configuration files needs to be modified to include my custom dictionary in cTakes.

My present dictionary looks like :
CUI         EnglishPreferredName
C1548760        Risk Codes - Aggressive
C1548761        Biohazard - Risk Codes

Thanks in advance.

Regards
Stuti Awasthi




::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------

RE: Integrate Custom Dictionary in cTakes.

Posted by Guy Engelhard <gu...@algotec.co.il>.
Hello Stuti,

Someone from my team investigated this and wrote the following manual for me. Unfortunately she has already left and I don't have any more details to give you. We didn't continue with populating new dictionaries yet. Still on our todo list.

Using customized dictionaries:


1.       Use UMLS MetamorphoSys to extract a customized subset from the UMLS DB (e.g. SNOMEDCT, NCI) and then use ctakes dictionary tool to construct an HSQL DB based on the extracted UMLS subset

a.       Note that The size of the dictionary influences on the runtime
Also she wrote the following on upgrading the 2011AB UMLS that comes with ctakes to 2015AA dictionary (I think it involves constructing this HSQL DB that is needed). Perhaps you can use this to piece together what needs to be done:


Currently, UMLS 2011AB is the only UMLS dictionary that is available as a ctakes-compatible HSQL DB. It can be downloaded from

http://sourceforge.net/projects/ctakesresources/files/

and is placed at:

<CTAKES HOME>\\resources\org\apache\ctakes\dictionary\lookup\umls2011ab<file:///\\resources\org\apache\ctakes\dictionary\lookup\umls2011ab>

e.g. E:\Program Files\apache-ctakes-3.2.2-rc2\resources\org\apache\ctakes\dictionary\lookup\umls2011ab



To generate ctakes-compatible dictionaries that are based on a newer UMLS version (e.g. UMLS 2015AB) or on a specific subset (e.g. only the SNOMEDCT source), use apache-ctakes-dictionary-tool, a package that is available as a project in my Eclipse workspace.



Example:

To generate an updated ctakes-dictionary with only terms from SNOMEDCT source, I did the following:



1.       Extract a SNOMEDCT subset from the latest UMLS version (UMLS_2015AA) (another email, "UMLS Metamorphosys subset creation", describes how to do this) and save the output at:

\\COMPUTER\Data\UMLS\2015AA_snomedct<file:///\\COMPUTER\Data\UMLS\2015AA_snomedct>



2.       Create an empty cTAKES HSQL database. This can be done as follows:

a.       Copy umls2011ab folder (\\E:\apache-ctakes-3.2.2-rc2\resources\org\apache\ctakes\dictionary\lookup\umls2011ab<file:///E:\apache-ctakes-3.2.2-rc2\resources\org\apache\ctakes\dictionary\lookup\umls2011ab>)

as a new folder (e.g. \\E:\NLP\Data\UMLS\umls_scratch<file:///E:\NLP\Data\UMLS\umls_scratch>).

b.      Change the directory umls_scratch and all its sub-directories to not be read-only, through the properties of the directory. Also open umls_scratch\ umls.properties as a text file and change "readonly" to false.

c.       Run HSQL manager as administrator (runManagerSwing.bat as administrator from \\E:\hsqldb-2.3.3\hsqldb\bin<file:///E:\hsqldb-2.3.3\hsqldb\bin>)

d.      In the Connect Window, choose "HSQL Database Engine Standalone" and set the following attributes for the other fields:

-          Driver: org.hsqldb.jdbcDriver

-          URL: jdbc:hsqldb:file:  E:\NLP\Data\UMLS\umls_scratch\umls<file:///\\ORANITDR7\Data\ctakes\umls_scratch\umls>

-          User: SA (leave the password field empty)

e.      Delete the content of the UMLS_MS_2011AB table by executing the following SQL command:

DELETE FROM UMLS_MS_2011AB

f.        Exit from the HSQL manager.

g.       Now copy the umls_scratch directory as a new directory named umls_2015aa _snomedct (e.g. \\E:\NLP\Data\UMLS\umls_2015aa_snomedct<file:///E:\NLP\Data\UMLS\umls_2015aa_snomedct>).

h.      In the future you can use copies of the emptied umls_scratch directory whenever needed.



3.       Add apache-ctakes-dictionarytool to Eclipse as a new Java project (File --> New --> Java Project)

4.

[cid:image001.png@01D0F6BC.202AE130]

5.       Copy the two files sources.txt and TUIs.txt into \\E:\NLP\Data\UMLS\umls_2015aa_snomedct<file:///E:\NLP\Data\UMLS\umls_2015aa_snomedct>.

6.       From Eclipe Run the DictionaryCreator (umls_ms) application of apache-ctakes-dictionary-tool with the following arguments (apache-ctakes-dictionary-tool --> src --> org.apache.ctakes.dictionarytool --> DictionaryCreator.java --> Run As --> Run Configurations --> Arguments):



-umls     \\ COMPUTER \Data\UMLS\2015AA_snomedct\2015AA\META<file:///\\ORANITDR7\Data\UMLS\2015AA_snomedct\2015AA\META>

-db         jdbc:hsqldb:file:\\COMPUTER\Data\ctakes\umls_2015aa_snomedct\hsql\umls

-tbl         UMLS_MS_2011AB

-tui         \\ COMPUTER \Data\ctakes\umls_2015aa_snomedct\TUIs.txt<file:///\\ORANITDR7\Data\ctakes\umls_2015aa_snomedct\TUIs.txt>

-src         \\ COMPUTER \Data\ctakes\umls_2015aa_snomedct\sources.txt<file:///\\ORANITDR7\Data\ctakes\umls_2015aa_snomedct\sources.txt>

-fw



7.       If you get an error, move everything in E:\NLP\Data\UMLS\umls_2015aa_snomedct other than sources.txt and TUIs.txt into the directory hsql, or else try removing the directory hsql from the path given in the -db argument.

8.       Run HSQL manager but this time connect to umls_2015aa_snomedct (jdbc:hsqldb:file:\\ COMPUTER \Data\ctakes\umls_2015aa_snomedct\hsql\umls) to check that the UMLS_2011AB table has been populated correctly.


From: Stuti Awasthi [mailto:stutiawasthi@hcl.com]
Sent: Tuesday, June 14, 2016 8:20 AM
To: 'user@ctakes.apache.org'
Subject: RE: Integrate Custom Dictionary in cTakes.

Hello Everyone,
I'm waiting for some response, even some pointers will be helpful.

Thanks &Regards
Stuti Awasthi

From: Stuti Awasthi
Sent: Monday, June 13, 2016 4:24 PM
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Integrate Custom Dictionary in cTakes.

Hi All,
Im using cTakes 3.2 and would like to use custom dictionary in place of UMLS to run few trials. Now in documentation I got the information that our new dictionary needs to be in BSV or hsql format but didn't got more details on the same.
I need some help to

·         convert my custom dictionary to bsv format (| separated).

·         How to integrate the new custom dictionary to cTakes. Which configuration files needs to be modified to include my custom dictionary in cTakes.

My present dictionary looks like :
CUI         EnglishPreferredName
C1548760        Risk Codes - Aggressive
C1548761        Biohazard - Risk Codes

Thanks in advance.

Regards
Stuti Awasthi




::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------
The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.
----------------------------------------------------------------------------------------------------------------------------------------------------

RE: Integrate Custom Dictionary in cTakes.

Posted by Stuti Awasthi <st...@hcl.com>.
Hello Everyone,
I'm waiting for some response, even some pointers will be helpful.

Thanks &Regards
Stuti Awasthi

From: Stuti Awasthi
Sent: Monday, June 13, 2016 4:24 PM
To: user@ctakes.apache.org
Subject: Integrate Custom Dictionary in cTakes.

Hi All,
Im using cTakes 3.2 and would like to use custom dictionary in place of UMLS to run few trials. Now in documentation I got the information that our new dictionary needs to be in BSV or hsql format but didn't got more details on the same.
I need some help to

*         convert my custom dictionary to bsv format (| separated).

*         How to integrate the new custom dictionary to cTakes. Which configuration files needs to be modified to include my custom dictionary in cTakes.

My present dictionary looks like :
CUI         EnglishPreferredName
C1548760        Risk Codes - Aggressive
C1548761        Biohazard - Risk Codes

Thanks in advance.

Regards
Stuti Awasthi




::DISCLAIMER::
----------------------------------------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.

----------------------------------------------------------------------------------------------------------------------------------------------------