You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Ab...@cognizant.com on 2020/06/01 14:55:46 UTC

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


·       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

·       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


·       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

·       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073&sdata=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D&reserved=0>  right?

·       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

·       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


From: Remy Sanouillet <re...@foreseemed.com>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org
Cc: user@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073&sdata=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D&reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

I should also mention:

By default ctakes uses Sentences as the "range" in which to find terms.  For instance, in the text "There was a lesion in the stomach.  Cancer was not diagnosed."  ctakes will (most likely) split the text into two sentences.  Within the first sentence it could discover "stomach", and in the second sentence it could discover "cancer".  However, it will not event try to discover the term "stomach cancer".  For the text "F84.1" ctakes may determine there two be two sentences: "F84" and "1".

There are a couple of ways to "correct" this.
1.  Use the SentenceDetectorAnnotatorBIO instead of SentenceDetector.  The BIO version is more of a "lumper" while the other is more of a "splitter".  In the piper:
// add SentenceDetector
add SentenceDetectorAnnotatorBIO classifierJarPath=/org/apache/ctakes/core/sentdetect/model.jar

2.  Use paragraphs as the discovery range for the dictionary lookup.  In the piper:
add ParagraphAnnotator
set windowAnnotations=org.apache.ctakes.typesystem.type.textspan.Paragraph
// -- lines for cli if you use them
add DefaultJCasTermAnnotator

3.  Create an annotator that joins the possibly erroneous splits.  You can use MrsDrSentenceJoiner as an example, but checking the last characters of a sentence and the first characters of the next sentence for digits, making sure that there is no whitespace between their offsets.

There may be another issue with the part of speech of a non-word such as "F84" causing it to be ignored as a candidate for lookup. The default exclusion (penn treebank) tags are:
"VB,VBD,VBG,VBN,VBP,VBZ,CC,CD,DT,EX,IN,LS,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"
You would want to remove (at least) "CD" and "LS".  In your piper:
set exclusionTags="VB,VBD,VBG,VBN,VBP,VBZ,CC,DT,EX,IN,MD,PDT,POS,PP,PP$,PRP,PRP$,RP,TO,WDT,WP,WPS,WRB"
// -- other parameters for dictionary lookup
add DefaultJCasTermAnnotator

It could be that ctakes is tagging things like "F84" as other parts of speech, so you would have to check on that and modify the exclusionTags accordingly.  You can check by adding at the end of your piper:
add pretty.plaintext.PrettyTextWriterFit
and checking the output files that it creates.

I realize that this seems like a lot to check, but dictionary lookup is not a simple beast.

Sean

________________________________________
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 12:04 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov__metathesaurus.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Icu2RffhYwFi_HSIPLZbgpQrJnUsXsmLdXFxZqPkA2k&s=Eykr6HAxfqxNWzeCkEIwHK0GlKwmrbPDy2dW2YEGBU4&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=Aks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA&e=>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov__metathesaurus.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Icu2RffhYwFi_HSIPLZbgpQrJnUsXsmLdXFxZqPkA2k&s=Eykr6HAxfqxNWzeCkEIwHK0GlKwmrbPDy2dW2YEGBU4&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM&e=>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://urldefense.proofpoint.com/v2/url?u=https-3A__uts.nlm.nih.gov__metathesaurus.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Icu2RffhYwFi_HSIPLZbgpQrJnUsXsmLdXFxZqPkA2k&s=Eykr6HAxfqxNWzeCkEIwHK0GlKwmrbPDy2dW2YEGBU4&e= <https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM&e=>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Abad,

If I am following you, this is a different problem.  

Previously you had the ICD code (for instance) in the text itself.  You wanted ctakes to identify the ICD code in the text and annotate it.
For this, I have no idea why a number in the dictionary would not be discovered.  I think that you have removed all of the filters that would prevent such a thing and you are practically left with pure string matching.
Did you add the PrettyText writer to your pipeline?  Did you check its output?  This kind of data can really help debugging.

Now it seems that you are asking about assigning ICD codes to some annotation discovered in the text, like "cancer".  For this second problem:

1.  You must make sure that you copy not only the INSERT lines, but also the CREATE table and index (on cui).  I am guessing that you did because otherwise hsql should throw an error and ctakes should exit.  I am writing this to attempt a complete answer.

2.  You must modify your dictionary parameters .xml file.  If you are using sno_rx_16ab then it is in the parent directory of the .script file, sno_rx_16ab.xml
Within       <name>sno_rx_16abConcepts</name>     you should see declared properties
         <property key="rxnormTable" value="long"/>
         <property key="snomedct_usTable" value="long"/>
You need to create properties for your codes.  For instance
         <property key="icd10Table" value="text"/>

The value is one of "long", "double", "text" if I remember correctly.  Text can be used for long and double as well as text - but you will want to match your table's column type.


As an aside, sInce you obtained your icd and cpt codes separately from the cuis used in sno_rx_16ab they won't match up 1:1.  There may be cuis that don't have an code in your icd table, but I would bet that there are a lot of icd codes in that table with cuis that are not in the main table, therefore they will never be used - they just slow down select calls.  You could try to filter your copied insert statements by existing cuis in the cui_terms table, but that is up to you.


Sean

________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Thursday, September 17, 2020 12:45 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried this and unfortunately this wasn't working, Our main goal was to extract/detect the respective code of(SnoMed/RxNorm/ICD/CPT) by cTAKES and meanwhile we saw an attribute in the OntologyConceptArray named as "code" where that attribute was having the respective SNOMEDCT and RxNORM code,So can we consider this attribute to get populated for all other newly configure codes?. The reason why am asking is because we couldn't see that "code" attribute getting populated for the newly configured ICD/CPT . Could you pls. advise us why this "code" would not be getting populated for the newly configured ICD/CPT . Pls. find the below steps that we did for adding  ICD/CPT into our profile

1. Generated a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,CPT) and leverage the full set of .rrf  files in the meta folder .
2.Using the same .rrf files we generated the .script file(which has the INSERT scripts to ICD10 and CPT tables).
3.Copied the INSERT scripts from the newly generated .script file and merged it to the existing sno_rx_16ab.script file.
4.Then restarted the cTAKES.

We could see that cTAKES was detecting the newly configured CUI's in ICD/CPT but could find that "code" attribute in the OntologyConceptArray was null for the detected ICD's and CPT's. It would have been helpful for us if that "code" attribute was returned by cTAKES for the newly configured ICD and CPT. Could you pls. advise us whether the steps followed by us is correct in the case of addition of ICD and CPT. Is there any other configuration changes required from our end for getting the "code" attribute populated as expected for ICD and CPT.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Wednesday, September 16, 2020 11:10 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

         final boolean isNonLookup = baseToken instanceof PunctuationToken
                                     || baseToken instanceof NumToken
                                     || baseToken instanceof ContractionToken
                                     || baseToken instanceof SymbolToken;

Comment out:
                                     || baseToken instanceof NumToken


Sean

________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in JCasTermAnnotator.java file 2. Removed the values of <value> tag of  <nameValuePair > with <name> tag as 'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493688425&amp;sdata=dRqaeUf%2F1c0HFrfjjUC94WDnAzzePxfkNMF2b5F45RI%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=vgJabrlGXt95THKcfj%2FoSVBpqDinduI3PcsygiBzxRk%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=hTVg93qvYrBBlSUUL6C%2Fo4MxQKoEyjzcp0KrSB9O7Mk%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=YNiBh%2FHc4hHQc%2BdUVyEA2VevNfrjges2QNnZB3Tlt74%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=hTVg93qvYrBBlSUUL6C%2Fo4MxQKoEyjzcp0KrSB9O7Mk%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=ZYHeGMZhmiUYiFygMua2XmXBp4UaIBKre14QKTbQFB0%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Ab...@cognizant.com.

Hi Sean,

We tried this and unfortunately this wasn't working, Our main goal was to extract/detect the respective code of(SnoMed/RxNorm/ICD/CPT) by cTAKES and meanwhile we saw an attribute in the OntologyConceptArray named as "code" where that attribute was having the respective SNOMEDCT and RxNORM code,So can we consider this attribute to get populated for all other newly configure codes?. The reason why am asking is because we couldn't see that "code" attribute getting populated for the newly configured ICD/CPT . Could you pls. advise us why this "code" would not be getting populated for the newly configured ICD/CPT . Pls. find the below steps that we did for adding  ICD/CPT into our profile

1. Generated a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,CPT) and leverage the full set of .rrf  files in the meta folder .
2.Using the same .rrf files we generated the .script file(which has the INSERT scripts to ICD10 and CPT tables).
3.Copied the INSERT scripts from the newly generated .script file and merged it to the existing sno_rx_16ab.script file.
4.Then restarted the cTAKES.

We could see that cTAKES was detecting the newly configured CUI's in ICD/CPT but could find that "code" attribute in the OntologyConceptArray was null for the detected ICD's and CPT's. It would have been helpful for us if that "code" attribute was returned by cTAKES for the newly configured ICD and CPT. Could you pls. advise us whether the steps followed by us is correct in the case of addition of ICD and CPT. Is there any other configuration changes required from our end for getting the "code" attribute populated as expected for ICD and CPT.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Wednesday, September 16, 2020 11:10 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

         final boolean isNonLookup = baseToken instanceof PunctuationToken
                                     || baseToken instanceof NumToken
                                     || baseToken instanceof ContractionToken
                                     || baseToken instanceof SymbolToken;

Comment out:
                                     || baseToken instanceof NumToken


Sean

________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in JCasTermAnnotator.java file 2. Removed the values of <value> tag of  <nameValuePair > with <name> tag as 'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493688425&amp;sdata=dRqaeUf%2F1c0HFrfjjUC94WDnAzzePxfkNMF2b5F45RI%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=vgJabrlGXt95THKcfj%2FoSVBpqDinduI3PcsygiBzxRk%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=hTVg93qvYrBBlSUUL6C%2Fo4MxQKoEyjzcp0KrSB9O7Mk%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=YNiBh%2FHc4hHQc%2BdUVyEA2VevNfrjges2QNnZB3Tlt74%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=hTVg93qvYrBBlSUUL6C%2Fo4MxQKoEyjzcp0KrSB9O7Mk%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C048bda38604d48b8840608d85a67a4ab%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637358748493698369&amp;sdata=ZYHeGMZhmiUYiFygMua2XmXBp4UaIBKre14QKTbQFB0%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Abad,

Since you are changing code ...


Line #320 of AbstractJcasTermAnnotator:

         final boolean isNonLookup = baseToken instanceof PunctuationToken
                                     || baseToken instanceof NumToken
                                     || baseToken instanceof ContractionToken
                                     || baseToken instanceof SymbolToken;

Comment out:
                                     || baseToken instanceof NumToken


Sean

________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Wednesday, September 16, 2020 1:01 PM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in JCasTermAnnotator.java file
2. Removed the values of <value> tag of  <nameValuePair > with <name> tag as 'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Ab...@cognizant.com.

Hi Sean,

We tried setting the exclusion tags="". PFB the changes we did

1. Set the value of String DEFAULT_EXCLUSION_TAGS = "" ; in JCasTermAnnotator.java file
2. Removed the values of <value> tag of  <nameValuePair > with <name> tag as 'exclusionTags' in UmlsLookupAnnotator.xml & UmlsOverlapLookupAnnotator.xml

But still we could see that "97112" was not getting picked up from dictionary. Is there anywhere else we need to try the changes. Kindly advise

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.

Thanks Tim.

I've been experimenting with the PennTreebank and see some potential for
using it as a powerful disambiguation tool.  The complex part is to find a
heuristic that minimizes the number of cases where the "big guns"   need to
be brought in -- because, yes, it would really slow things down.

Peter

On Tue, Sep 15, 2020 at 12:54 PM Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Peter,
> The parts of speech come from the ctakes-pos-tagger module, which uses
> the OpenNLP pos tagger trained on clinical data. There is a
> constituency parser as well, which I think in theory can tag even
> better (that might be able to get you a unary branch in a tree from NN
> -> CD -> <number>.), but is a lot slower than the pos tagger and we
> probably don't want to make it necessary to run for simple dictionary
> pipelines.
> Tim
>
> On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> > * External Email - Caution *
> >
> >
> > Sean this conversation raises for me a question that I've had for a
> > while.
> >  Does the term finding mechanism actually use a treebank to find the
> > POS or
> > does it use a another less rigorous approach.   If it were rigorous
> > wouldn't it be able to tag a pure number as an NN in the role
> > of  object if
> > it played the corresponding role in the sentence?
> >
> > I've not had the same problem as Ayyub,  but I have been wondering
> > why one
> > needed to disable the identification of cm as a genetic acronym
> > because of
> > situations where clearly cm is part of a unit of measure and would
> > show up
> > as an entity's modifier in a treebank.
> >
> > Does the question make sense?
> >
> > Peter
> >
> > On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu>
> > wrote:
> >
> > > I should mention that going the Paragraph route would only impact
> > > term
> > > lookup.
> > > ________________________________________
> > > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > > Sent: Tuesday, September 15, 2020 11:54 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thank you Sean for the response. We shall definitely try that way.
> > > I have
> > > one question on the "f84.1" problem, since we have now developed a
> > > lot of
> > > features based on the output from cTAKES, is the impact of changing
> > > the
> > > sentenceDetectorAnnotator going to be huge?
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Finan, Sean <Se...@childrens.harvard.edu>
> > > Sent: Tuesday, September 15, 2020 9:06 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > [External]
> > >
> > >
> > > Hi Abad,
> > >
> > > The first thing that I would try for the "97112" problem is
> > > changing the
> > > parts of speech that are ignored for lookup.  Right now a pure
> > > number is
> > > ignored - it is not a word.  So, similar to what I said in my
> > > previous
> > > email, change the dictionary lookup parameter exclusionTags.  But
> > > to make
> > > sure that you get everything, you can first try no exclusions:
> > > set exclusionTags=""
> > >
> > > My guess with the F84.1 problem is that your sentence splitter is
> > > splitting "F84.1" but not splitting "F84 . 1".
> > >
> > > I think that the best way to start debugging is adding the
> > > PrettyTextWriter to the end of the piper and looking at its output
> > > (see my
> > > previous email).   It will print each sentence on a line and
> > > indicate the
> > > part of speech for each token.  If you can quickly and easily see
> > > what the
> > > system is doing then you might start to understand what needs to be
> > > changed
> > > to fit your data.
> > >
> > > Sean
> > > ________________________________________
> > > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > > Sent: Tuesday, September 15, 2020 11:15 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Thank you Sean for the detailed response.  I think there was
> > > miscommunication from our end with the requirement. Your solution
> > > of adding
> > > spaces between the entries worked but it required the input  text
> > > also to
> > > have the spaces. If the text comes in as 'F84.1' cTAKES didn't
> > > reckon the
> > > token but if the text came as 'F84 . 1' then cTAKES was recognizing
> > > the
> > > tokens for the below INSERT scripts.
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > >
> > > But we encountered a similar issue when we configured an INSERT
> > > entry as
> > > below for CPT codes,
> > >
> > > INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
> > >
> > > Where 97112 is a CPT code(which usually doesn’t have decimals or
> > > '.'). We
> > > expected cTAKES to recognize the CPT code '97112' as a separate
> > > token but
> > > it didn't. Could you pls. advise us on why this issue came up.
> > >
> > > Is there something wrong in the configuration. Do we need to have
> > > something additional for cTAKES to recognize the code alone as a
> > > separate
> > > token Is there any other way in which we can try to get the
> > > respective
> > > ICD/CPT code of the identified annotation from cTAKES, like
> > > querying the
> > > CPT/ICD table using the fetched CUI? Kindly advise.
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Finan, Sean <Se...@childrens.harvard.edu>
> > > Sent: Monday, September 14, 2020 9:35 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > [External]
> > >
> > >
> > > Hi Abad,
> > >
> > >
> > > I think that you need to make only one minor change.
> > >
> > >
> > > ctakes uses "tokens" for identification and not the actual text.
> > > Tokenization turns text such as "F84.1" into "F84 . 1"  The first
> > > token
> > > being F84, followed by a token encompassing '.' and another with
> > > '1'.  The
> > > manner in which this is indicated in the .script file is by adding
> > > a space
> > > between each token.  This makes the full entry:
> > >
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > >
> > >
> > > Notice that the token length is now 3 and the full text contains
> > > the
> > > between-token spaces.  This would carry forward for the other
> > > entries, such
> > > as:
> > >
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)
> > >
> > >
> > > Sean
> > >
> > >
> > > ________________________________
> > > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > > Sent: Monday, September 14, 2020 11:32 AM
> > > To: dev@ctakes.apache.org
> > > Subject: RE: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Team,
> > >
> > > I hope you all are doing good. With your support ,We were able to
> > > successfully add our required synonyms into existing dictionary and
> > > could
> > > see that it was getting successfully picked up by cTAKES. Now we
> > > have a
> > > requirement to configure the ICD and CPT also, where we followed
> > > the steps
> > > as mentioned in cTAKES wiki and generated the respective .script
> > > file.
> > >
> > > The newly created dictionary which comprises of
> > > SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as
> > > expected
> > > but we have a requirement to extract the ICD code for the
> > > respective
> > > description . so the scenario would be like for a text like below
> > >
> > > ‘F84.1 pervasive developmental disorders’
> > >
> > > We would need cTAKES to reckon F84.1 as a token or at least as an
> > > attribute in any of the ‘IdentifiedAnnotation’. So for achieving
> > > the same
> > > based on our prior experience we tried to tweak the dictionary
> > > where we
> > > added a synonym for the existing CUI as below
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive
> > > developmental
> > > disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2,
> > > ‘F84.1
> > > pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1,
> > > ‘F84.1’,’F84.1’)
> > >
> > > Though we have seen cTAKES can identify ‘F84’ alone as a token but
> > > it
> > > won’t consider whenever a ‘.’ Has been encountered. As an end
> > > result cTAKES
> > > won’t be able to give the ICD codes like F84.1,M25.6 as separate
> > > tokens.
> > > Since almost all of the ICD codes have  a ‘.’ Associated with it,
> > > this way
> > > of tweaking the dictionary is not working. Infact cTAKES is
> > > recognizing the
> > > digit after decimal within the ‘FractionAnnotation’
> > >
> > > Does cTAKES have the capability to return the code like ICD code
> > > while
> > > retrieving  the token as an individual token or as an attribute in
> > > any of
> > > the tokens
> > >
> > > Is there any other way in which the dictionary can be tweaked , so
> > > that a
> > > synonym addition as below will recognize the ICD code as a token
> > > and will
> > > be returned from cTAKES
> > >
> > > INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
> > >
> > >
> > > Kindly check and advise us on how to proceed on this situation
> > >
> > > Thanks & Regards
> > > [cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > From: Remy Sanouillet <re...@foreseemed.com>
> > > Sent: Tuesday, June 2, 2020 7:23 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES
> > >
> > > [External]
> > > Hi Abad,
> > >
> > > •       How can we point cTAKES application to multiple
> > > dictionaries.
> > > Currently only sno_rx_16ab is pointed to the application, how can I
> > > tweak
> > > it to point that to multiple dictionary simultaneously. Or you
> > > meant to say
> > > create a fresh dictionary with all the vocabularies and point just
> > > that in
> > > cTAKES.
> > >
> > > If you go back in the archive a bit, you should find a thread where
> > > I went
> > > into detail on how to add multiple dictionaries. Combining all
> > > dictionaries
> > > into a fresh dictionary is not recommended for obvious reasons. If
> > > you
> > > can't find the thread, I will dig it up.
> > >
> > > •       So for these edits I will have to add INSERT queries to
> > > respective
> > > tables in the sno_rx_16ab.script file right? Do I need to make any
> > > more
> > > changes for these tokens to get reflected in cTAKES.
> > >
> > > Nope! That is all that is needed and next time you launch cTakes,
> > > it
> > > should recognize your new entries.
> > >
> > > •       If it is a non-existing CUI , I can get the respective
> > > CUI,TUI
> > > from here
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0
> >
> > > right?
> > >
> > > Correct! Remember that the ontology has multiple-inheritance so you
> > > need
> > > to grab all the TUIs for a given CUI.
> > >
> > > •       Based on the source I will have to add entry to respective
> > > table
> > > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > > one of it
> > > and not in all. Correct me if am wrong on this understanding
> > >
> > > That is also correct. And most of the time, the dictionaries only
> > > contain
> > > one CODE table so it is not even a question. However, sno_rx_16ab
> > > is an
> > > exception with both a CODE table for SNOMEDCT_US and RXNORM. They
> > > mostly do
> > > not overlap. I do remember that there were a couple of exceptions
> > > but, in
> > > the case where that happens, the metathesaurus will show it.
> > > For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes
> > > (372815001 and 68088000) *and* an RXNORM of 149.
> > >
> > > •       PREFTERM table will be having only one entry for each CUI
> > > right?
> > > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > > Correct me
> > > if am wrong on this understanding.
> > >
> > > You are correct here also. It is a one-to-one mapping although the
> > > system
> > > appears to tolerate when the PREFTERM is missing.
> > >
> > > Rémy Sanouillet
> > > NLP Engineer
> > > remys@foreseemed.com<ma...@foreseemed.com>
> > >
> > >
> > > [image.png]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ForeSee Medical, Inc.
> > > 12555 High Bluff Drive, Suite 100
> > > San Diego, CA 92130
> > >
> > > NOTICE: This e-mail message and all attachments transmitted with it
> > > are
> > > intended solely for the use of the addressee and may contain
> > > legally
> > > privileged and confidential information. If the reader of this
> > > message is
> > > not the intended recipient, or an employee or agent responsible for
> > > delivering this message to the intended recipient, you are hereby
> > > notified
> > > that any dissemination, distribution, copying, or other use of this
> > > message
> > > or its attachments is strictly prohibited. If you have received
> > > this
> > > message in error, please notify the sender immediately by replying
> > > to this
> > > message and please delete it from your computer.
> > >
> > >
> > > On Mon, Jun 1, 2020 at 7:56 AM <Abad.Ayyub@cognizant.com<mailto:
> > > Abad.Ayyub@cognizant.com>> wrote:
> > > Thank you Remy and Peter for your responses. I hope you guys are
> > > doing
> > > good and safe in this lock down period. Could you pls. help me on
> > > my below
> > > queries in creating an additional dictionary.
> > >
> > >
> > > •       How to create additional dictionary. You meant to say using
> > > the
> > > UMLS tool , so that using that tool we create .script files from
> > > .RRF files?
> > >
> > > •       How can we point cTAKES application to multiple
> > > dictionaries.
> > > Currently only sno_rx_16ab is pointed to the application, how can I
> > > tweak
> > > it to point that to multiple dictionary simultaneously. Or you
> > > meant to say
> > > create a fresh dictionary with all the vocabularies and point just
> > > that in
> > > cTAKES.
> > >
> > > I hope Remy was explaining editing the existing dictionary where I
> > > would
> > > deal with two scenarios where one was with existing CUI and other
> > > was with
> > > Non-existing CUI. Could you pls. resolve the below queries
> > > regarding the
> > > same.
> > >
> > >
> > > •       So for these edits I will have to add INSERT queries to
> > > respective
> > > tables in the sno_rx_16ab.script file right? Do I need to make any
> > > more
> > > changes for these tokens to get reflected in cTAKES.
> > >
> > > •       If it is a non-existing CUI , I can get the respective
> > > CUI,TUI
> > > from here
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0
> >
> > > right?
> > >
> > > •       Based on the source I will have to add entry to respective
> > > table
> > > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > > one of it
> > > and not in all. Correct me if am wrong on this understanding
> > >
> > > •       PREFTERM table will be having only one entry for each CUI
> > > right?
> > > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > > Correct me
> > > if am wrong on this understanding.
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > From: Remy Sanouillet <remys@foreseemed.com<mailto:
> > > remys@foreseemed.com>>
> > > Sent: Friday, May 29, 2020 9:25 PM
> > > To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > > Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
> > > Subject: Re: Building a new custom dictionary or Updating/Adding
> > > values to
> > > the existing dictionary in cTAKES
> > >
> > > [External]
> > > Hello Abad,
> > >
> > > The short answer is, yes, the sno_rx_16ab can be "hacked". A couple
> > > of
> > > caveats are that any mistake can stop all recognition and you will
> > > lose all
> > > your mods on updates. So an additional dictionary is a recommended
> > > approach.
> > >
> > > There are two cases. EIther the CUI you are adding already exists
> > > and you
> > > are just adding a synonym. In that case, you only need to add one
> > > line:
> > > INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
> > > where:
> > >
> > >   *   CUI is the cui, nuf'said
> > >   *   TEXT is the tokenized lowercase string for the entry. In your
> > > case
> > > 'pap smear'. Most punctuation is a separate token. Single quotes
> > > are
> > > escaped by doubling them
> > >   *   RWORD is the one token in TEXT that is the most indicative
> > > (least
> > > common) which will be used as the index in the lookup. In your case
> > > probably 'pap' since it is not as common as 'smear'
> > >   *   RINDEX is the index of RWORD in TEXT. First token is 0 which
> > > is the
> > > case for 'pap'
> > >   *   TCOUNT is the token count for TEXT. In your case, 2
> > > So you would want to add:
> > > INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
> > >
> > >  If the entry is a non-existing one, you will need to add a few
> > > more
> > > lines. Their positions are unimportant as long as they are below
> > > the header
> > > lines (below the final "SET SCHEMA PUBLIC" line).
> > >
> > >   1.  INSERT INTO TUI VALUES(CUI,TUI)
> > > One line for each TUI in the taxonomy
> > >   2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are
> > > adding a
> > > SNOMED
> > >   3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is
> > > the
> > > pretty string to describe the entry. It need not correspond to any
> > > indexed
> > > entry. It is used for display once the lookup has been successful.
> > > That's it. Use at your own discretion. No guarantees.
> > >
> > >
> > > Rémy Sanouillet
> > > NLP Engineer
> > > remys@foreseemed.com<ma...@foreseemed.com>
> > >
> > >
> > >
> > > ForeSee Medical, Inc.
> > > 12555 High Bluff Drive, Suite 100
> > > San Diego, CA 92130
> > >
> > > NOTICE: This e-mail message and all attachments transmitted with it
> > > are
> > > intended solely for the use of the addressee and may contain
> > > legally
> > > privileged and confidential information. If the reader of this
> > > message is
> > > not the intended recipient, or an employee or agent responsible for
> > > delivering this message to the intended recipient, you are hereby
> > > notified
> > > that any dissemination, distribution, copying, or other use of this
> > > message
> > > or its attachments is strictly prohibited. If you have received
> > > this
> > > message in error, please notify the sender immediately by replying
> > > to this
> > > message and please delete it from your computer.
> > >
> > >
> > > On Fri, May 29, 2020 at 7:34 AM <Abad.Ayyub@cognizant.com<mailto:
> > > Abad.Ayyub@cognizant.com>> wrote:
> > > Hi Team,
> > >
> > > We set up cTAKES4.0.0 as our NLP engine for our profile recently .
> > > We have
> > > faced situations where some of the expected tokens are not picked
> > > up by
> > > cTAKES during clinical text extraction. So our first thought
> > > process was to
> > > identify where the dictionary is configured and how that can be
> > > updated.
> > > After some code analysis  it was found that the dictionary is
> > > configured in
> > > the  below path under ctakes/resources for sources RxNorm and
> > > SNOMEDCT_US
> > >
> > > We were able to open the hsqldb using the hsql db gui and found out
> > > that
> > > some of our required entries are already there . So if I come
> > > specifically
> > > to our current problem. The  Pap Smear and Mamogram are two
> > > clinical terms
> > > which are not currently recognized by cTAKES in our profile.
> > >
> > > •       If I look into the .script file , Pap Smear and
> > > Mammogram/Mammography is already present in the .script file and in
> > > the
> > > respective tables. PFB a snapshot as below
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > But still this was not recogonised by cTAKES. I see there are some
> > > filters
> > > working on top of the available entries in dictionary(ctakes-gui
> > > and
> > > ctake-gui-res). Will that be because of these filters the tokens
> > > are not
> > > recognized as expected. Could you pls. share us what exactly these
> > > filters
> > > do. This will help us in future also when we are trying to add new
> > > terms
> > > into the dictionary
> > >
> > >
> > >
> > > •       What are the steps to do if we need to add/edit entries
> > > into the
> > > existing dictionaries. I see we can add/edit the existing values in
> > > .scripts files but  our primary doubt is if suppose I have a term
> > > ‘xyz’ to
> > > be added to dictionary how can I get the CUI and other values like
> > > TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random
> > > value
> > > for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create
> > > custom
> > > bsv dictionaries but couldn’t see much documentation for it. Kindly
> > > advise
> > > which is the better option from the below 3.
> > >
> > >
> > >
> > > o   Generate a custom dictionary using METAMORPHOSYS UML
> > > installation
> > > tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and
> > > leverage the
> > > full set of .rrf  files in the meta folder . Is this approach
> > > better if the
> > > entries to be populated are maximal?
> > >
> > > o   Add/edit the available dictionary sno_rx_16ab and in that case
> > > how to
> > > provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT
> > > and
> > > PREFTERM. If the entries to be populated are minimal is this
> > > approach would
> > > be better?.
> > >
> > > o   Use a custom bsv , in that case how should we add  values to
> > > custom
> > > bsv. Could you also provide a sample in that case.
> > >
> > > I found a Metathesaurus browser in the below url , where I can
> > > search for
> > > the terms and get the CUI  and the respective source like
> > > ICD/CPT/MDR. But
> > > still I was unable to get the other required attributes to  be
> > > populated
> > > like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what
> > > these
> > > attributes signifies
> > >
> > >
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0
> > > <
> > >
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0
> > >
> > > Kindly advise us on how to proceed on this and correct us if we
> > > went wrong
> > > somewhere. This would be of great help for us
> > >
> > > P.S : We comply with UMLS license
> > >
> > >
> > > Thanks & Regards
> > >
> > > Abad Ayyub
> > > Vnet: 406170 | Cell : +91-9447379028
> > >
> > >
> > >
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored. This e-mail and
> > > any files
> > > transmitted with it are for the sole use of the intended
> > > recipient(s) and
> > > may contain confidential and privileged information. If you are not
> > > the
> > > intended recipient(s), please reply to the sender and destroy all
> > > copies of
> > > the original message. Any unauthorized review, use, disclosure,
> > > dissemination, forwarding, printing or copying of this email,
> > > and/or any
> > > action taken in reliance on the contents of this e-mail is strictly
> > > prohibited and may be unlawful. Where permitted by applicable law,
> > > this
> > > e-mail and other e-mail communications sent to and from Cognizant
> > > e-mail
> > > addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > > to the
> > > sender and destroy all copies of the original message. Any
> > > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> > > by
> > > applicable law, this e-mail and other e-mail communications sent to
> > > and
> > > from Cognizant e-mail addresses may be monitored.
> > >
>

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Peter,
The parts of speech come from the ctakes-pos-tagger module, which uses
the OpenNLP pos tagger trained on clinical data. There is a
constituency parser as well, which I think in theory can tag even
better (that might be able to get you a unary branch in a tree from NN
-> CD -> <number>.), but is a lot slower than the pos tagger and we
probably don't want to make it necessary to run for simple dictionary
pipelines. 
Tim

On Tue, 2020-09-15 at 12:12 -0700, Peter Abramowitsch wrote:
> * External Email - Caution *
> 
> 
> Sean this conversation raises for me a question that I've had for a
> while.
>  Does the term finding mechanism actually use a treebank to find the
> POS or
> does it use a another less rigorous approach.   If it were rigorous
> wouldn't it be able to tag a pure number as an NN in the role
> of  object if
> it played the corresponding role in the sentence?
> 
> I've not had the same problem as Ayyub,  but I have been wondering
> why one
> needed to disable the identification of cm as a genetic acronym
> because of
> situations where clearly cm is part of a unit of measure and would
> show up
> as an entity's modifier in a treebank.
> 
> Does the question make sense?
> 
> Peter
> 
> On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu>
> wrote:
> 
> > I should mention that going the Paragraph route would only impact
> > term
> > lookup.
> > ________________________________________
> > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > Sent: Tuesday, September 15, 2020 11:54 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the response. We shall definitely try that way.
> > I have
> > one question on the "f84.1" problem, since we have now developed a
> > lot of
> > features based on the output from cTAKES, is the impact of changing
> > the
> > sentenceDetectorAnnotator going to be huge?
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > -----Original Message-----
> > From: Finan, Sean <Se...@childrens.harvard.edu>
> > Sent: Tuesday, September 15, 2020 9:06 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > [External]
> > 
> > 
> > Hi Abad,
> > 
> > The first thing that I would try for the "97112" problem is
> > changing the
> > parts of speech that are ignored for lookup.  Right now a pure
> > number is
> > ignored - it is not a word.  So, similar to what I said in my
> > previous
> > email, change the dictionary lookup parameter exclusionTags.  But
> > to make
> > sure that you get everything, you can first try no exclusions:
> > set exclusionTags=""
> > 
> > My guess with the F84.1 problem is that your sentence splitter is
> > splitting "F84.1" but not splitting "F84 . 1".
> > 
> > I think that the best way to start debugging is adding the
> > PrettyTextWriter to the end of the piper and looking at its output
> > (see my
> > previous email).   It will print each sentence on a line and
> > indicate the
> > part of speech for each token.  If you can quickly and easily see
> > what the
> > system is doing then you might start to understand what needs to be
> > changed
> > to fit your data.
> > 
> > Sean
> > ________________________________________
> > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > Sent: Tuesday, September 15, 2020 11:15 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Thank you Sean for the detailed response.  I think there was
> > miscommunication from our end with the requirement. Your solution
> > of adding
> > spaces between the entries worked but it required the input  text
> > also to
> > have the spaces. If the text comes in as 'F84.1' cTAKES didn't
> > reckon the
> > token but if the text came as 'F84 . 1' then cTAKES was recognizing
> > the
> > tokens for the below INSERT scripts.
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > 
> > But we encountered a similar issue when we configured an INSERT
> > entry as
> > below for CPT codes,
> > 
> > INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
> > 
> > Where 97112 is a CPT code(which usually doesn’t have decimals or
> > '.'). We
> > expected cTAKES to recognize the CPT code '97112' as a separate
> > token but
> > it didn't. Could you pls. advise us on why this issue came up.
> > 
> > Is there something wrong in the configuration. Do we need to have
> > something additional for cTAKES to recognize the code alone as a
> > separate
> > token Is there any other way in which we can try to get the
> > respective
> > ICD/CPT code of the identified annotation from cTAKES, like
> > querying the
> > CPT/ICD table using the fetched CUI? Kindly advise.
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > -----Original Message-----
> > From: Finan, Sean <Se...@childrens.harvard.edu>
> > Sent: Monday, September 14, 2020 9:35 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > [External]
> > 
> > 
> > Hi Abad,
> > 
> > 
> > I think that you need to make only one minor change.
> > 
> > 
> > ctakes uses "tokens" for identification and not the actual text.
> > Tokenization turns text such as "F84.1" into "F84 . 1"  The first
> > token
> > being F84, followed by a token encompassing '.' and another with
> > '1'.  The
> > manner in which this is indicated in the .script file is by adding
> > a space
> > between each token.  This makes the full entry:
> > 
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
> > 
> > 
> > Notice that the token length is now 3 and the full text contains
> > the
> > between-token spaces.  This would carry forward for the other
> > entries, such
> > as:
> > 
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)
> > 
> > 
> > Sean
> > 
> > 
> > ________________________________
> > From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> > Sent: Monday, September 14, 2020 11:32 AM
> > To: dev@ctakes.apache.org
> > Subject: RE: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES [EXTERNAL]
> > 
> > * External Email - Caution *
> > 
> > 
> > Hi Team,
> > 
> > I hope you all are doing good. With your support ,We were able to
> > successfully add our required synonyms into existing dictionary and
> > could
> > see that it was getting successfully picked up by cTAKES. Now we
> > have a
> > requirement to configure the ICD and CPT also, where we followed
> > the steps
> > as mentioned in cTAKES wiki and generated the respective .script
> > file.
> > 
> > The newly created dictionary which comprises of
> > SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as
> > expected
> > but we have a requirement to extract the ICD code for the
> > respective
> > description . so the scenario would be like for a text like below
> > 
> > ‘F84.1 pervasive developmental disorders’
> > 
> > We would need cTAKES to reckon F84.1 as a token or at least as an
> > attribute in any of the ‘IdentifiedAnnotation’. So for achieving
> > the same
> > based on our prior experience we tried to tweak the dictionary
> > where we
> > added a synonym for the existing CUI as below
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive
> > developmental
> > disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2,
> > ‘F84.1
> > pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1,
> > ‘F84.1’,’F84.1’)
> > 
> > Though we have seen cTAKES can identify ‘F84’ alone as a token but
> > it
> > won’t consider whenever a ‘.’ Has been encountered. As an end
> > result cTAKES
> > won’t be able to give the ICD codes like F84.1,M25.6 as separate
> > tokens.
> > Since almost all of the ICD codes have  a ‘.’ Associated with it,
> > this way
> > of tweaking the dictionary is not working. Infact cTAKES is
> > recognizing the
> > digit after decimal within the ‘FractionAnnotation’
> > 
> > Does cTAKES have the capability to return the code like ICD code
> > while
> > retrieving  the token as an individual token or as an attribute in
> > any of
> > the tokens
> > 
> > Is there any other way in which the dictionary can be tweaked , so
> > that a
> > synonym addition as below will recognize the ICD code as a token
> > and will
> > be returned from cTAKES
> > 
> > INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
> > 
> > 
> > Kindly check and advise us on how to proceed on this situation
> > 
> > Thanks & Regards
> > [cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > From: Remy Sanouillet <re...@foreseemed.com>
> > Sent: Tuesday, June 2, 2020 7:23 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES
> > 
> > [External]
> > Hi Abad,
> > 
> > •       How can we point cTAKES application to multiple
> > dictionaries.
> > Currently only sno_rx_16ab is pointed to the application, how can I
> > tweak
> > it to point that to multiple dictionary simultaneously. Or you
> > meant to say
> > create a fresh dictionary with all the vocabularies and point just
> > that in
> > cTAKES.
> > 
> > If you go back in the archive a bit, you should find a thread where
> > I went
> > into detail on how to add multiple dictionaries. Combining all
> > dictionaries
> > into a fresh dictionary is not recommended for obvious reasons. If
> > you
> > can't find the thread, I will dig it up.
> > 
> > •       So for these edits I will have to add INSERT queries to
> > respective
> > tables in the sno_rx_16ab.script file right? Do I need to make any
> > more
> > changes for these tokens to get reflected in cTAKES.
> > 
> > Nope! That is all that is needed and next time you launch cTakes,
> > it
> > should recognize your new entries.
> > 
> > •       If it is a non-existing CUI , I can get the respective
> > CUI,TUI
> > from here
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>
> > right?
> > 
> > Correct! Remember that the ontology has multiple-inheritance so you
> > need
> > to grab all the TUIs for a given CUI.
> > 
> > •       Based on the source I will have to add entry to respective
> > table
> > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > one of it
> > and not in all. Correct me if am wrong on this understanding
> > 
> > That is also correct. And most of the time, the dictionaries only
> > contain
> > one CODE table so it is not even a question. However, sno_rx_16ab
> > is an
> > exception with both a CODE table for SNOMEDCT_US and RXNORM. They
> > mostly do
> > not overlap. I do remember that there were a couple of exceptions
> > but, in
> > the case where that happens, the metathesaurus will show it.
> > For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes
> > (372815001 and 68088000) *and* an RXNORM of 149.
> > 
> > •       PREFTERM table will be having only one entry for each CUI
> > right?
> > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > Correct me
> > if am wrong on this understanding.
> > 
> > You are correct here also. It is a one-to-one mapping although the
> > system
> > appears to tolerate when the PREFTERM is missing.
> > 
> > Rémy Sanouillet
> > NLP Engineer
> > remys@foreseemed.com<ma...@foreseemed.com>
> > 
> > 
> > [image.png]
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ForeSee Medical, Inc.
> > 12555 High Bluff Drive, Suite 100
> > San Diego, CA 92130
> > 
> > NOTICE: This e-mail message and all attachments transmitted with it
> > are
> > intended solely for the use of the addressee and may contain
> > legally
> > privileged and confidential information. If the reader of this
> > message is
> > not the intended recipient, or an employee or agent responsible for
> > delivering this message to the intended recipient, you are hereby
> > notified
> > that any dissemination, distribution, copying, or other use of this
> > message
> > or its attachments is strictly prohibited. If you have received
> > this
> > message in error, please notify the sender immediately by replying
> > to this
> > message and please delete it from your computer.
> > 
> > 
> > On Mon, Jun 1, 2020 at 7:56 AM <Abad.Ayyub@cognizant.com<mailto:
> > Abad.Ayyub@cognizant.com>> wrote:
> > Thank you Remy and Peter for your responses. I hope you guys are
> > doing
> > good and safe in this lock down period. Could you pls. help me on
> > my below
> > queries in creating an additional dictionary.
> > 
> > 
> > •       How to create additional dictionary. You meant to say using
> > the
> > UMLS tool , so that using that tool we create .script files from
> > .RRF files?
> > 
> > •       How can we point cTAKES application to multiple
> > dictionaries.
> > Currently only sno_rx_16ab is pointed to the application, how can I
> > tweak
> > it to point that to multiple dictionary simultaneously. Or you
> > meant to say
> > create a fresh dictionary with all the vocabularies and point just
> > that in
> > cTAKES.
> > 
> > I hope Remy was explaining editing the existing dictionary where I
> > would
> > deal with two scenarios where one was with existing CUI and other
> > was with
> > Non-existing CUI. Could you pls. resolve the below queries
> > regarding the
> > same.
> > 
> > 
> > •       So for these edits I will have to add INSERT queries to
> > respective
> > tables in the sno_rx_16ab.script file right? Do I need to make any
> > more
> > changes for these tokens to get reflected in cTAKES.
> > 
> > •       If it is a non-existing CUI , I can get the respective
> > CUI,TUI
> > from here
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>
> > right?
> > 
> > •       Based on the source I will have to add entry to respective
> > table
> > right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either
> > one of it
> > and not in all. Correct me if am wrong on this understanding
> > 
> > •       PREFTERM table will be having only one entry for each CUI
> > right?
> > Basically it’s a one-to-one mapping between CUI and PREFTERM .
> > Correct me
> > if am wrong on this understanding.
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > From: Remy Sanouillet <remys@foreseemed.com<mailto:
> > remys@foreseemed.com>>
> > Sent: Friday, May 29, 2020 9:25 PM
> > To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
> > Subject: Re: Building a new custom dictionary or Updating/Adding
> > values to
> > the existing dictionary in cTAKES
> > 
> > [External]
> > Hello Abad,
> > 
> > The short answer is, yes, the sno_rx_16ab can be "hacked". A couple
> > of
> > caveats are that any mistake can stop all recognition and you will
> > lose all
> > your mods on updates. So an additional dictionary is a recommended
> > approach.
> > 
> > There are two cases. EIther the CUI you are adding already exists
> > and you
> > are just adding a synonym. In that case, you only need to add one
> > line:
> > INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
> > where:
> > 
> >   *   CUI is the cui, nuf'said
> >   *   TEXT is the tokenized lowercase string for the entry. In your
> > case
> > 'pap smear'. Most punctuation is a separate token. Single quotes
> > are
> > escaped by doubling them
> >   *   RWORD is the one token in TEXT that is the most indicative
> > (least
> > common) which will be used as the index in the lookup. In your case
> > probably 'pap' since it is not as common as 'smear'
> >   *   RINDEX is the index of RWORD in TEXT. First token is 0 which
> > is the
> > case for 'pap'
> >   *   TCOUNT is the token count for TEXT. In your case, 2
> > So you would want to add:
> > INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
> > 
> >  If the entry is a non-existing one, you will need to add a few
> > more
> > lines. Their positions are unimportant as long as they are below
> > the header
> > lines (below the final "SET SCHEMA PUBLIC" line).
> > 
> >   1.  INSERT INTO TUI VALUES(CUI,TUI)
> > One line for each TUI in the taxonomy
> >   2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are
> > adding a
> > SNOMED
> >   3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is
> > the
> > pretty string to describe the entry. It need not correspond to any
> > indexed
> > entry. It is used for display once the lookup has been successful.
> > That's it. Use at your own discretion. No guarantees.
> > 
> > 
> > Rémy Sanouillet
> > NLP Engineer
> > remys@foreseemed.com<ma...@foreseemed.com>
> > 
> > 
> > 
> > ForeSee Medical, Inc.
> > 12555 High Bluff Drive, Suite 100
> > San Diego, CA 92130
> > 
> > NOTICE: This e-mail message and all attachments transmitted with it
> > are
> > intended solely for the use of the addressee and may contain
> > legally
> > privileged and confidential information. If the reader of this
> > message is
> > not the intended recipient, or an employee or agent responsible for
> > delivering this message to the intended recipient, you are hereby
> > notified
> > that any dissemination, distribution, copying, or other use of this
> > message
> > or its attachments is strictly prohibited. If you have received
> > this
> > message in error, please notify the sender immediately by replying
> > to this
> > message and please delete it from your computer.
> > 
> > 
> > On Fri, May 29, 2020 at 7:34 AM <Abad.Ayyub@cognizant.com<mailto:
> > Abad.Ayyub@cognizant.com>> wrote:
> > Hi Team,
> > 
> > We set up cTAKES4.0.0 as our NLP engine for our profile recently .
> > We have
> > faced situations where some of the expected tokens are not picked
> > up by
> > cTAKES during clinical text extraction. So our first thought
> > process was to
> > identify where the dictionary is configured and how that can be
> > updated.
> > After some code analysis  it was found that the dictionary is
> > configured in
> > the  below path under ctakes/resources for sources RxNorm and
> > SNOMEDCT_US
> > 
> > We were able to open the hsqldb using the hsql db gui and found out
> > that
> > some of our required entries are already there . So if I come
> > specifically
> > to our current problem. The  Pap Smear and Mamogram are two
> > clinical terms
> > which are not currently recognized by cTAKES in our profile.
> > 
> > •       If I look into the .script file , Pap Smear and
> > Mammogram/Mammography is already present in the .script file and in
> > the
> > respective tables. PFB a snapshot as below
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > But still this was not recogonised by cTAKES. I see there are some
> > filters
> > working on top of the available entries in dictionary(ctakes-gui
> > and
> > ctake-gui-res). Will that be because of these filters the tokens
> > are not
> > recognized as expected. Could you pls. share us what exactly these
> > filters
> > do. This will help us in future also when we are trying to add new
> > terms
> > into the dictionary
> > 
> > 
> > 
> > •       What are the steps to do if we need to add/edit entries
> > into the
> > existing dictionaries. I see we can add/edit the existing values in
> > .scripts files but  our primary doubt is if suppose I have a term
> > ‘xyz’ to
> > be added to dictionary how can I get the CUI and other values like
> > TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random
> > value
> > for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create
> > custom
> > bsv dictionaries but couldn’t see much documentation for it. Kindly
> > advise
> > which is the better option from the below 3.
> > 
> > 
> > 
> > o   Generate a custom dictionary using METAMORPHOSYS UML
> > installation
> > tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and
> > leverage the
> > full set of .rrf  files in the meta folder . Is this approach
> > better if the
> > entries to be populated are maximal?
> > 
> > o   Add/edit the available dictionary sno_rx_16ab and in that case
> > how to
> > provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT
> > and
> > PREFTERM. If the entries to be populated are minimal is this
> > approach would
> > be better?.
> > 
> > o   Use a custom bsv , in that case how should we add  values to
> > custom
> > bsv. Could you also provide a sample in that case.
> > 
> > I found a Metathesaurus browser in the below url , where I can
> > search for
> > the terms and get the CUI  and the respective source like
> > ICD/CPT/MDR. But
> > still I was unable to get the other required attributes to  be
> > populated
> > like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what
> > these
> > attributes signifies
> > 
> > 
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0
> > <
> > https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0
> > 
> > Kindly advise us on how to proceed on this and correct us if we
> > went wrong
> > somewhere. This would be of great help for us
> > 
> > P.S : We comply with UMLS license
> > 
> > 
> > Thanks & Regards
> > 
> > Abad Ayyub
> > Vnet: 406170 | Cell : +91-9447379028
> > 
> > 
> > 
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored. This e-mail and
> > any files
> > transmitted with it are for the sole use of the intended
> > recipient(s) and
> > may contain confidential and privileged information. If you are not
> > the
> > intended recipient(s), please reply to the sender and destroy all
> > copies of
> > the original message. Any unauthorized review, use, disclosure,
> > dissemination, forwarding, printing or copying of this email,
> > and/or any
> > action taken in reliance on the contents of this e-mail is strictly
> > prohibited and may be unlawful. Where permitted by applicable law,
> > this
> > e-mail and other e-mail communications sent to and from Cognizant
> > e-mail
> > addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use
> > of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply
> > to the
> > sender and destroy all copies of the original message. Any
> > unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > this email, and/or any action taken in reliance on the contents of
> > this
> > e-mail is strictly prohibited and may be unlawful. Where permitted
> > by
> > applicable law, this e-mail and other e-mail communications sent to
> > and
> > from Cognizant e-mail addresses may be monitored.
> >

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.

Sean this conversation raises for me a question that I've had for a while.
 Does the term finding mechanism actually use a treebank to find the POS or
does it use a another less rigorous approach.   If it were rigorous
wouldn't it be able to tag a pure number as an NN in the role of  object if
it played the corresponding role in the sentence?

I've not had the same problem as Ayyub,  but I have been wondering why one
needed to disable the identification of cm as a genetic acronym because of
situations where clearly cm is part of a unit of measure and would show up
as an entity's modifier in a treebank.

Does the question make sense?

Peter

On Tue, Sep 15, 2020, 9:02 AM Finan, Sean <Se...@childrens.harvard.edu>
wrote:

> I should mention that going the Paragraph route would only impact term
> lookup.
> ________________________________________
> From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> Sent: Tuesday, September 15, 2020 11:54 AM
> To: dev@ctakes.apache.org
> Subject: RE: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thank you Sean for the response. We shall definitely try that way. I have
> one question on the "f84.1" problem, since we have now developed a lot of
> features based on the output from cTAKES, is the impact of changing the
> sentenceDetectorAnnotator going to be huge?
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> -----Original Message-----
> From: Finan, Sean <Se...@childrens.harvard.edu>
> Sent: Tuesday, September 15, 2020 9:06 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> [External]
>
>
> Hi Abad,
>
> The first thing that I would try for the "97112" problem is changing the
> parts of speech that are ignored for lookup.  Right now a pure number is
> ignored - it is not a word.  So, similar to what I said in my previous
> email, change the dictionary lookup parameter exclusionTags.  But to make
> sure that you get everything, you can first try no exclusions:
> set exclusionTags=""
>
> My guess with the F84.1 problem is that your sentence splitter is
> splitting "F84.1" but not splitting "F84 . 1".
>
> I think that the best way to start debugging is adding the
> PrettyTextWriter to the end of the piper and looking at its output (see my
> previous email).   It will print each sentence on a line and indicate the
> part of speech for each token.  If you can quickly and easily see what the
> system is doing then you might start to understand what needs to be changed
> to fit your data.
>
> Sean
> ________________________________________
> From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> Sent: Tuesday, September 15, 2020 11:15 AM
> To: dev@ctakes.apache.org
> Subject: RE: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> * External Email - Caution *
>
>
> Thank you Sean for the detailed response.  I think there was
> miscommunication from our end with the requirement. Your solution of adding
> spaces between the entries worked but it required the input  text also to
> have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the
> token but if the text came as 'F84 . 1' then cTAKES was recognizing the
> tokens for the below INSERT scripts.
>
> INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
>
> But we encountered a similar issue when we configured an INSERT entry as
> below for CPT codes,
>
> INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)
>
> Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We
> expected cTAKES to recognize the CPT code '97112' as a separate token but
> it didn't. Could you pls. advise us on why this issue came up.
>
> Is there something wrong in the configuration. Do we need to have
> something additional for cTAKES to recognize the code alone as a separate
> token Is there any other way in which we can try to get the respective
> ICD/CPT code of the identified annotation from cTAKES, like querying the
> CPT/ICD table using the fetched CUI? Kindly advise.
>
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> -----Original Message-----
> From: Finan, Sean <Se...@childrens.harvard.edu>
> Sent: Monday, September 14, 2020 9:35 PM
> To: dev@ctakes.apache.org
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> [External]
>
>
> Hi Abad,
>
>
> I think that you need to make only one minor change.
>
>
> ctakes uses "tokens" for identification and not the actual text.
> Tokenization turns text such as "F84.1" into "F84 . 1"  The first token
> being F84, followed by a token encompassing '.' and another with '1'.  The
> manner in which this is indicated in the .script file is by adding a space
> between each token.  This makes the full entry:
>
>
> INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)
>
>
> Notice that the token length is now 3 and the full text contains the
> between-token spaces.  This would carry forward for the other entries, such
> as:
>
>
> INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)
>
>
> Sean
>
>
> ________________________________
> From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
> Sent: Monday, September 14, 2020 11:32 AM
> To: dev@ctakes.apache.org
> Subject: RE: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Team,
>
> I hope you all are doing good. With your support ,We were able to
> successfully add our required synonyms into existing dictionary and could
> see that it was getting successfully picked up by cTAKES. Now we have a
> requirement to configure the ICD and CPT also, where we followed the steps
> as mentioned in cTAKES wiki and generated the respective .script file.
>
> The newly created dictionary which comprises of
> SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected
> but we have a requirement to extract the ICD code for the respective
> description . so the scenario would be like for a text like below
>
> ‘F84.1 pervasive developmental disorders’
>
> We would need cTAKES to reckon F84.1 as a token or at least as an
> attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same
> based on our prior experience we tried to tweak the dictionary where we
> added a synonym for the existing CUI as below
>
> INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental
> disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1
> pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
>
> Though we have seen cTAKES can identify ‘F84’ alone as a token but it
> won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES
> won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens.
> Since almost all of the ICD codes have  a ‘.’ Associated with it, this way
> of tweaking the dictionary is not working. Infact cTAKES is recognizing the
> digit after decimal within the ‘FractionAnnotation’
>
> Does cTAKES have the capability to return the code like ICD code while
> retrieving  the token as an individual token or as an attribute in any of
> the tokens
>
> Is there any other way in which the dictionary can be tweaked , so that a
> synonym addition as below will recognize the ICD code as a token and will
> be returned from cTAKES
>
> INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)
>
>
> Kindly check and advise us on how to proceed on this situation
>
> Thanks & Regards
> [cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> From: Remy Sanouillet <re...@foreseemed.com>
> Sent: Tuesday, June 2, 2020 7:23 AM
> To: dev@ctakes.apache.org
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES
>
> [External]
> Hi Abad,
>
> •       How can we point cTAKES application to multiple dictionaries.
> Currently only sno_rx_16ab is pointed to the application, how can I tweak
> it to point that to multiple dictionary simultaneously. Or you meant to say
> create a fresh dictionary with all the vocabularies and point just that in
> cTAKES.
>
> If you go back in the archive a bit, you should find a thread where I went
> into detail on how to add multiple dictionaries. Combining all dictionaries
> into a fresh dictionary is not recommended for obvious reasons. If you
> can't find the thread, I will dig it up.
>
> •       So for these edits I will have to add INSERT queries to respective
> tables in the sno_rx_16ab.script file right? Do I need to make any more
> changes for these tokens to get reflected in cTAKES.
>
> Nope! That is all that is needed and next time you launch cTakes, it
> should recognize your new entries.
>
> •       If it is a non-existing CUI , I can get the respective CUI,TUI
> from here
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>
> right?
>
> Correct! Remember that the ontology has multiple-inheritance so you need
> to grab all the TUIs for a given CUI.
>
> •       Based on the source I will have to add entry to respective table
> right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it
> and not in all. Correct me if am wrong on this understanding
>
> That is also correct. And most of the time, the dictionaries only contain
> one CODE table so it is not even a question. However, sno_rx_16ab is an
> exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do
> not overlap. I do remember that there were a couple of exceptions but, in
> the case where that happens, the metathesaurus will show it.
> For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes
> (372815001 and 68088000) *and* an RXNORM of 149.
>
> •       PREFTERM table will be having only one entry for each CUI right?
> Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me
> if am wrong on this understanding.
>
> You are correct here also. It is a one-to-one mapping although the system
> appears to tolerate when the PREFTERM is missing.
>
> Rémy Sanouillet
> NLP Engineer
> remys@foreseemed.com<ma...@foreseemed.com>
>
>
> [image.png]
>
>
>
>
>
>
>
>
> ForeSee Medical, Inc.
> 12555 High Bluff Drive, Suite 100
> San Diego, CA 92130
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
> On Mon, Jun 1, 2020 at 7:56 AM <Abad.Ayyub@cognizant.com<mailto:
> Abad.Ayyub@cognizant.com>> wrote:
> Thank you Remy and Peter for your responses. I hope you guys are doing
> good and safe in this lock down period. Could you pls. help me on my below
> queries in creating an additional dictionary.
>
>
> •       How to create additional dictionary. You meant to say using the
> UMLS tool , so that using that tool we create .script files from .RRF files?
>
> •       How can we point cTAKES application to multiple dictionaries.
> Currently only sno_rx_16ab is pointed to the application, how can I tweak
> it to point that to multiple dictionary simultaneously. Or you meant to say
> create a fresh dictionary with all the vocabularies and point just that in
> cTAKES.
>
> I hope Remy was explaining editing the existing dictionary where I would
> deal with two scenarios where one was with existing CUI and other was with
> Non-existing CUI. Could you pls. resolve the below queries regarding the
> same.
>
>
> •       So for these edits I will have to add INSERT queries to respective
> tables in the sno_rx_16ab.script file right? Do I need to make any more
> changes for these tokens to get reflected in cTAKES.
>
> •       If it is a non-existing CUI , I can get the respective CUI,TUI
> from here
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>
> right?
>
> •       Based on the source I will have to add entry to respective table
> right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it
> and not in all. Correct me if am wrong on this understanding
>
> •       PREFTERM table will be having only one entry for each CUI right?
> Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me
> if am wrong on this understanding.
>
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> From: Remy Sanouillet <re...@foreseemed.com>>
> Sent: Friday, May 29, 2020 9:25 PM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: Re: Building a new custom dictionary or Updating/Adding values to
> the existing dictionary in cTAKES
>
> [External]
> Hello Abad,
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
> where:
>
>   *   CUI is the cui, nuf'said
>   *   TEXT is the tokenized lowercase string for the entry. In your case
> 'pap smear'. Most punctuation is a separate token. Single quotes are
> escaped by doubling them
>   *   RWORD is the one token in TEXT that is the most indicative (least
> common) which will be used as the index in the lookup. In your case
> probably 'pap' since it is not as common as 'smear'
>   *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the
> case for 'pap'
>   *   TCOUNT is the token count for TEXT. In your case, 2
> So you would want to add:
> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>
>  If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
>   1.  INSERT INTO TUI VALUES(CUI,TUI)
> One line for each TUI in the taxonomy
>   2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a
> SNOMED
>   3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the
> pretty string to describe the entry. It need not correspond to any indexed
> entry. It is used for display once the lookup has been successful.
> That's it. Use at your own discretion. No guarantees.
>
>
> Rémy Sanouillet
> NLP Engineer
> remys@foreseemed.com<ma...@foreseemed.com>
>
>
>
> ForeSee Medical, Inc.
> 12555 High Bluff Drive, Suite 100
> San Diego, CA 92130
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
> On Fri, May 29, 2020 at 7:34 AM <Abad.Ayyub@cognizant.com<mailto:
> Abad.Ayyub@cognizant.com>> wrote:
> Hi Team,
>
> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have
> faced situations where some of the expected tokens are not picked up by
> cTAKES during clinical text extraction. So our first thought process was to
> identify where the dictionary is configured and how that can be updated.
> After some code analysis  it was found that the dictionary is configured in
> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>
> We were able to open the hsqldb using the hsql db gui and found out that
> some of our required entries are already there . So if I come specifically
> to our current problem. The  Pap Smear and Mamogram are two clinical terms
> which are not currently recognized by cTAKES in our profile.
>
> •       If I look into the .script file , Pap Smear and
> Mammogram/Mammography is already present in the .script file and in the
> respective tables. PFB a snapshot as below
>
>
>
>
>
>
>
>
>
> But still this was not recogonised by cTAKES. I see there are some filters
> working on top of the available entries in dictionary(ctakes-gui and
> ctake-gui-res). Will that be because of these filters the tokens are not
> recognized as expected. Could you pls. share us what exactly these filters
> do. This will help us in future also when we are trying to add new terms
> into the dictionary
>
>
>
> •       What are the steps to do if we need to add/edit entries into the
> existing dictionaries. I see we can add/edit the existing values in
> .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to
> be added to dictionary how can I get the CUI and other values like
> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
> which is the better option from the below 3.
>
>
>
> o   Generate a custom dictionary using METAMORPHOSYS UML installation
> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
> full set of .rrf  files in the meta folder . Is this approach better if the
> entries to be populated are maximal?
>
> o   Add/edit the available dictionary sno_rx_16ab and in that case how to
> provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
> PREFTERM. If the entries to be populated are minimal is this approach would
> be better?.
>
> o   Use a custom bsv , in that case how should we add  values to custom
> bsv. Could you also provide a sample in that case.
>
> I found a Metathesaurus browser in the below url , where I can search for
> the terms and get the CUI  and the respective source like ICD/CPT/MDR. But
> still I was unable to get the other required attributes to  be populated
> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these
> attributes signifies
>
>
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0
> <
> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0
> >
>
> Kindly advise us on how to proceed on this and correct us if we went wrong
> somewhere. This would be of great help for us
>
> P.S : We comply with UMLS license
>
>
> Thanks & Regards
>
> Abad Ayyub
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

I should mention that going the Paragraph route would only impact term lookup.
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the response. We shall definitely try that way. I have one question on the "f84.1" problem, since we have now developed a lot of features based on the output from cTAKES, is the impact of changing the sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

I think that you need to make only one minor change.

ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:

INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)

Sean

________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>

[image.png]

ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.

On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.

•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>

ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.

On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below

But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary

•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.

o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Abad,

Changing the sentence detector will make a change.  However in terms of term lookup I wouldn't call it "huge".  However, I would spot-check a series of notes just to see how it impacts your data specifically.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the response. We shall definitely try that way. I have one question on the "f84.1" problem, since we have now developed a lot of features based on the output from cTAKES, is the impact of changing the sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]

Hi Abad,

I think that you need to make only one minor change.

ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:

INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)

Sean

________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *

Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>

[image.png]

ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.

On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.

•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>

ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.

On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below

But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary

•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.

o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Ab...@cognizant.com.

Thank you Sean for the response. We shall definitely try that way. I have one question on the "f84.1" problem, since we have now developed a lot of features based on the output from cTAKES, is the impact of changing the sentenceDetectorAnnotator going to be huge?

Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, September 15, 2020 9:06 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711644893&amp;sdata=hXu2kXG4Xt%2Bw2kh61fAPVD0FRW25XcZWhcRAJtIGkf0%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=vBFcrxWI0hFUqB%2B1s0F%2FWqPN%2F%2BNFTXm4pCaJB16qCfI%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=5ffFqKOHKUDW8hrOw2%2Ftbg%2FumJa%2FbE%2B7oB84PMgUAbo%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711649892&amp;sdata=UR%2F623xDp4qXTS7p%2BRxux0I0CN4w0rtyd4a13RxIMuU%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0%26d%3DDwIGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DhNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME%26s%3DWuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=b2kcCzr6Vio3aE1ixikQLVP6X2TILDeEEEHEQiCnE1Y%3D&amp;reserved=0 <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Ccf606465888d49e922fa08d8598d0fec%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637357809711654890&amp;sdata=BNOwS9Bz2ajf0Z1Ig1KxvlVxBFzFe4jACN5NffZIF1g%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Abad,

The first thing that I would try for the "97112" problem is changing the parts of speech that are ignored for lookup.  Right now a pure number is ignored - it is not a word.  So, similar to what I said in my previous email, change the dictionary lookup parameter exclusionTags.  But to make sure that you get everything, you can first try no exclusions:
set exclusionTags=""

My guess with the F84.1 problem is that your sentence splitter is splitting "F84.1" but not splitting "F84 . 1".

I think that the best way to start debugging is adding the PrettyTextWriter to the end of the piper and looking at its output (see my previous email).   It will print each sentence on a line and indicate the part of speech for each token.  If you can quickly and easily see what the system is doing then you might start to understand what needs to be changed to fit your data.

Sean
________________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Tuesday, September 15, 2020 11:15 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token
Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME&s=WuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw&e= <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=%2BC1T6ynAPci%2FvDGvWZV6uSE4gKYC5cjaUyszZRYGY34%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME&s=WuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw&e= <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=w%2B4UeSVqPRQktC%2FMoCGQhRbzn7xSuMoA3znLM86Uk8M%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252F-252Fmetathesaurus.html-26amp-3Bdata-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257C28b35c064d474a289fbd08d858c7ea90-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637356962959976394-26amp-3Bsdata-3D2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=hNvrRzAHvcCXfHnoaSacGNAAqM4UXu0zPaOlGH4K5ME&s=WuQh-Ty9Xl9rlhk8J3aOBylaw9UQLLQxEGwKQGUOBZw&e= <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=gqff4%2BWMgqDHHXLv%2FJw3f7x6GqLqQJ3b67IkjBl8QaI%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by Ab...@cognizant.com.

Thank you Sean for the detailed response.  I think there was miscommunication from our end with the requirement. Your solution of adding spaces between the entries worked but it required the input  text also to have the spaces. If the text comes in as 'F84.1' cTAKES didn't reckon the token but if the text came as 'F84 . 1' then cTAKES was recognizing the tokens for the below INSERT scripts.

INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)

But we encountered a similar issue when we configured an INSERT entry as below for CPT codes,

INSERT INTO CUI_TERMS VALUES(41154,0,1, ‘97112’,’97112’)

Where 97112 is a CPT code(which usually doesn’t have decimals or '.'). We expected cTAKES to recognize the CPT code '97112' as a separate token but it didn't. Could you pls. advise us on why this issue came up.

Is there something wrong in the configuration. Do we need to have something additional for cTAKES to recognize the code alone as a separate token
Is there any other way in which we can try to get the respective ICD/CPT code of the identified annotation from cTAKES, like querying the CPT/ICD table using the fetched CUI? Kindly advise.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Monday, September 14, 2020 9:35 PM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

[External]


Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’) INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’) INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2F%2Fmetathesaurus.html&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g%3D&amp;reserved=0<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3DAks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=%2BC1T6ynAPci%2FvDGvWZV6uSE4gKYC5cjaUyszZRYGY34%3D&amp;reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2F%2Fmetathesaurus.html&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g%3D&amp;reserved=0<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=w%2B4UeSVqPRQktC%2FMoCGQhRbzn7xSuMoA3znLM86Uk8M%3D&amp;reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED) assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM) where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2F%2Fmetathesaurus.html&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=2tgzGJUzWdtDSTyT7MI93e2i17aeFW8Nqp3s4D1cj8g%3D&amp;reserved=0<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0%26d%3DDwMGaQ%26c%3DqS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU%26r%3Dfs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao%26m%3DYKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U%26s%3D8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM%26e%3D&amp;data=02%7C01%7CAbad.Ayyub%40cognizant.com%7C28b35c064d474a289fbd08d858c7ea90%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637356962959976394&amp;sdata=gqff4%2BWMgqDHHXLv%2FJw3f7x6GqLqQJ3b67IkjBl8QaI%3D&amp;reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Abad,


I think that you need to make only one minor change.


ctakes uses "tokens" for identification and not the actual text.  Tokenization turns text such as "F84.1" into "F84 . 1"  The first token being F84, followed by a token encompassing '.' and another with '1'.  The manner in which this is indicated in the .script file is by adding a space between each token.  This makes the full entry:


INSERT INTO CUI_TERMS VALUES(4352,0,3, ‘F84 . 1’,’F84’)


Notice that the token length is now 3 and the full text contains the between-token spaces.  This would carry forward for the other entries, such as:


INSERT INTO CUI_TERMS VALUES(4352,3,4, ‘F84 . 1 pdd’, ‘pdd’)


Sean


________________________________
From: Abad.Ayyub@cognizant.com <Ab...@cognizant.com>
Sent: Monday, September 14, 2020 11:32 AM
To: dev@ctakes.apache.org
Subject: RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES [EXTERNAL]

* External Email - Caution *


Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://uts.nlm.nih.gov//metathesaurus.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246493365-26sdata-3DhNixbxffJ9-252Fx-252Bho9J41gjonaT9IGLsxIqABKq1dpzG8-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=Aks7ZCfU7hTRPTyJJdrrdupKbd1n1TpuFdf-10yQtrA&e=>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://uts.nlm.nih.gov//metathesaurus.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246503352-26sdata-3DbbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=3BlK-CxQfaf_mvf6rMZ7MK1GJIEnflO1MlbEZ1oTsEM&e=>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://uts.nlm.nih.gov//metathesaurus.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__apc01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Futs.nlm.nih.gov-252Fmetathesaurus.html-26data-3D02-257C01-257CAbad.Ayyub-2540cognizant.com-257Cc8b0b69302014cff91ac08d80697c6a7-257Cde08c40719b9427d9fe8edf254300ca7-257C0-257C0-257C637266596246513622-26sdata-3DCYHTv-252B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU-253D-26reserved-3D0&d=DwMGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=YKrpBhJM7sqCBY3Ow1jSUhu5QBdlnoqFGZbsVZIHH8U&s=8AfoyzMZC6lva419TTWLPVYtTCWEZOmAiRxvgSn6cxM&e=>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards

Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028



This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

RE: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

Posted by Ab...@cognizant.com.

Hi Team,

I hope you all are doing good. With your support ,We were able to successfully add our required synonyms into existing dictionary and could see that it was getting successfully picked up by cTAKES. Now we have a requirement to configure the ICD and CPT also, where we followed the steps as mentioned in cTAKES wiki and generated the respective .script file.

The newly created dictionary which comprises of SNOMEDCT_US,RxNORM,ICD10,CPT are identifying the descriptions as expected but we have a requirement to extract the ICD code for the respective description . so the scenario would be like for a text like below

‘F84.1 pervasive developmental disorders’

We would need cTAKES to reckon F84.1 as a token or at least as an attribute in any of the ‘IdentifiedAnnotation’. So for achieving the same based on our prior experience we tried to tweak the dictionary where we added a synonym for the existing CUI as below

INSERT INTO CUI_TERMS VALUES(4352,1,4, ‘F84.1 pervasive developmental disorders’, ‘pervasive’)
INSERT INTO CUI_TERMS VALUES(4352,1,2, ‘F84.1 pdd’, ‘pdd’)
INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)

Though we have seen cTAKES can identify ‘F84’ alone as a token but it won’t consider whenever a ‘.’ Has been encountered. As an end result cTAKES won’t be able to give the ICD codes like F84.1,M25.6 as separate tokens. Since almost all of the ICD codes have  a ‘.’ Associated with it, this way of tweaking the dictionary is not working. Infact cTAKES is recognizing the digit after decimal within the ‘FractionAnnotation’

Does cTAKES have the capability to return the code like ICD code while retrieving  the token as an individual token or as an attribute in any of the tokens

Is there any other way in which the dictionary can be tweaked , so that a synonym addition as below will recognize the ICD code as a token and will be returned from cTAKES

INSERT INTO CUI_TERMS VALUES(4352,0,1, ‘F84.1’,’F84.1’)


Kindly check and advise us on how to proceed on this situation

Thanks & Regards
[cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


From: Remy Sanouillet <re...@foreseemed.com>
Sent: Tuesday, June 2, 2020 7:23 AM
To: dev@ctakes.apache.org
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hi Abad,

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

If you go back in the archive a bit, you should find a thread where I went into detail on how to add multiple dictionaries. Combining all dictionaries into a fresh dictionary is not recommended for obvious reasons. If you can't find the thread, I will dig it up.

•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

Nope! That is all that is needed and next time you launch cTakes, it should recognize your new entries.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cc8b0b69302014cff91ac08d80697c6a7%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637266596246493365&sdata=hNixbxffJ9%2Fx%2Bho9J41gjonaT9IGLsxIqABKq1dpzG8%3D&reserved=0>  right?

Correct! Remember that the ontology has multiple-inheritance so you need to grab all the TUIs for a given CUI.

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

That is also correct. And most of the time, the dictionaries only contain one CODE table so it is not even a question. However, sno_rx_16ab is an exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do not overlap. I do remember that there were a couple of exceptions but, in the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001 and 68088000) *and* an RXNORM of 149.

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.

You are correct here also. It is a one-to-one mapping although the system appears to tolerate when the PREFTERM is missing.

Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>


[image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com>> wrote:
Thank you Remy and Peter for your responses. I hope you guys are doing good and safe in this lock down period. Could you pls. help me on my below queries in creating an additional dictionary.


•       How to create additional dictionary. You meant to say using the UMLS tool , so that using that tool we create .script files from .RRF files?

•       How can we point cTAKES application to multiple dictionaries. Currently only sno_rx_16ab is pointed to the application, how can I tweak it to point that to multiple dictionary simultaneously. Or you meant to say create a fresh dictionary with all the vocabularies and point just that in cTAKES.

I hope Remy was explaining editing the existing dictionary where I would deal with two scenarios where one was with existing CUI and other was with Non-existing CUI. Could you pls. resolve the below queries regarding the same.


•       So for these edits I will have to add INSERT queries to respective tables in the sno_rx_16ab.script file right? Do I need to make any more changes for these tokens to get reflected in cTAKES.

•       If it is a non-existing CUI , I can get the respective CUI,TUI from here  https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cc8b0b69302014cff91ac08d80697c6a7%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637266596246503352&sdata=bbpLuRz7gcbSopU7kFxTJrlsAiqZY4TiK15eq1l4qVs%3D&reserved=0>  right?

•       Based on the source I will have to add entry to respective table right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it and not in all. Correct me if am wrong on this understanding

•       PREFTERM table will be having only one entry for each CUI right? Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me if am wrong on this understanding.


Thanks & Regards
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


From: Remy Sanouillet <re...@foreseemed.com>>
Sent: Friday, May 29, 2020 9:25 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Cc: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

[External]
Hello Abad,

The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of caveats are that any mistake can stop all recognition and you will lose all your mods on updates. So an additional dictionary is a recommended approach.

There are two cases. EIther the CUI you are adding already exists and you are just adding a synonym. In that case, you only need to add one line:
INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
where:

  *   CUI is the cui, nuf'said
  *   TEXT is the tokenized lowercase string for the entry. In your case 'pap smear'. Most punctuation is a separate token. Single quotes are escaped by doubling them
  *   RWORD is the one token in TEXT that is the most indicative (least common) which will be used as the index in the lookup. In your case probably 'pap' since it is not as common as 'smear'
  *   RINDEX is the index of RWORD in TEXT. First token is 0 which is the case for 'pap'
  *   TCOUNT is the token count for TEXT. In your case, 2
So you would want to add:
INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')

 If the entry is a non-existing one, you will need to add a few more lines. Their positions are unimportant as long as they are below the header lines (below the final "SET SCHEMA PUBLIC" line).

  1.  INSERT INTO TUI VALUES(CUI,TUI)
One line for each TUI in the taxonomy
  2.  INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
assuming you are adding a SNOMED
  3.  INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
where PREFTERM is the pretty string to describe the entry. It need not correspond to any indexed entry. It is used for display once the lookup has been successful.
That's it. Use at your own discretion. No guarantees.


Rémy Sanouillet
NLP Engineer
remys@foreseemed.com<ma...@foreseemed.com>



ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are intended solely for the use of the addressee and may contain legally privileged and confidential information. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution, copying, or other use of this message or its attachments is strictly prohibited. If you have received this message in error, please notify the sender immediately by replying to this message and please delete it from your computer.


On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com>> wrote:
Hi Team,

We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have faced situations where some of the expected tokens are not picked up by cTAKES during clinical text extraction. So our first thought process was to identify where the dictionary is configured and how that can be updated. After some code analysis  it was found that the dictionary is configured in the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US

We were able to open the hsqldb using the hsql db gui and found out that some of our required entries are already there . So if I come specifically to our current problem. The  Pap Smear and Mamogram are two clinical terms which are not currently recognized by cTAKES in our profile.

•       If I look into the .script file , Pap Smear and Mammogram/Mammography is already present in the .script file and in the respective tables. PFB a snapshot as below









But still this was not recogonised by cTAKES. I see there are some filters working on top of the available entries in dictionary(ctakes-gui and ctake-gui-res). Will that be because of these filters the tokens are not recognized as expected. Could you pls. share us what exactly these filters do. This will help us in future also when we are trying to add new terms into the dictionary



•       What are the steps to do if we need to add/edit entries into the existing dictionaries. I see we can add/edit the existing values in .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to be added to dictionary how can I get the CUI and other values like TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom bsv dictionaries but couldn’t see much documentation for it. Kindly advise which is the better option from the below 3.



o   Generate a custom dictionary using METAMORPHOSYS UML installation tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the full set of .rrf  files in the meta folder . Is this approach better if the entries to be populated are maximal?

o   Add/edit the available dictionary sno_rx_16ab and in that case how to provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and PREFTERM. If the entries to be populated are minimal is this approach would be better?.

o   Use a custom bsv , in that case how should we add  values to custom bsv. Could you also provide a sample in that case.

I found a Metathesaurus browser in the below url , where I can search for the terms and get the CUI  and the respective source like ICD/CPT/MDR. But still I was unable to get the other required attributes to  be populated like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these attributes signifies

https://uts.nlm.nih.gov//metathesaurus.html<https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cc8b0b69302014cff91ac08d80697c6a7%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637266596246513622&sdata=CYHTv%2B8qE9VFAz1mzW2XP18B8EsdrhpchPQKuEDHlBU%3D&reserved=0>

Kindly advise us on how to proceed on this and correct us if we went wrong somewhere. This would be of great help for us

P.S : We comply with UMLS license


Thanks & Regards
Abad Ayyub
Vnet: 406170 | Cell : +91-9447379028


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.
This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored. This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from Cognizant e-mail addresses may be monitored.

Re: Building a new custom dictionary or Updating/Adding values to the existing dictionary in cTAKES

Posted by Remy Sanouillet <re...@foreseemed.com>.

Hi Abad,


> ·       How can we point cTAKES application to multiple dictionaries.
> Currently only sno_rx_16ab is pointed to the application, how can I tweak
> it to point that to multiple dictionary simultaneously. Or you meant to say
> create a fresh dictionary with all the vocabularies and point just that in
> cTAKES.


If you go back in the archive a bit, you should find a thread where I went
into detail on how to add multiple dictionaries. Combining all dictionaries
into a fresh dictionary is not recommended for obvious reasons. If you
can't find the thread, I will dig it up.



> ·       So for these edits I will have to add INSERT queries to
> respective tables in the sno_rx_16ab.script file right? Do I need to make
> any more changes for these tokens to get reflected in cTAKES.


Nope! That is all that is needed and next time you launch cTakes, it should
recognize your new entries.

·       If it is a non-existing CUI , I can get the respective CUI,TUI from
> here  https://uts.nlm.nih.gov//metathesaurus.html
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073&sdata=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D&reserved=0>
>   right?


Correct! Remember that the ontology has multiple-inheritance so you need to
grab all the TUIs for a given CUI.


> ·       Based on the source I will have to add entry to respective table
> right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it
> and not in all. Correct me if am wrong on this understanding


That is also correct. And most of the time, the dictionaries only contain
one CODE table so it is not even a question. However, sno_rx_16ab is an
exception with both a CODE table for SNOMEDCT_US and RXNORM. They mostly do
not overlap. I do remember that there were a couple of exceptions but, in
the case where that happens, the metathesaurus will show it.
For example: 'Acebutolol' (CUI: C0000946) has two SNOMEDCT_US codes (372815001
and 68088000) **and** an RXNORM of 149.

·       PREFTERM table will be having only one entry for each CUI right?
> Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me
> if am wrong on this understanding.


You are correct here also. It is a one-to-one mapping although the system
appears to tolerate when the PREFTERM is missing.

*Rémy Sanouillet*
NLP Engineer
remys@foreseemed.com <xx...@foreseemed.com>


[image: image.png]








ForeSee Medical, Inc.
12555 High Bluff Drive, Suite 100
San Diego, CA 92130

NOTICE: This e-mail message and all attachments transmitted with it are
intended solely for the use of the addressee and may contain legally
privileged and confidential information. If the reader of this message is
not the intended recipient, or an employee or agent responsible for
delivering this message to the intended recipient, you are hereby notified
that any dissemination, distribution, copying, or other use of this message
or its attachments is strictly prohibited. If you have received this
message in error, please notify the sender immediately by replying to this
message and please delete it from your computer.


On Mon, Jun 1, 2020 at 7:56 AM <Ab...@cognizant.com> wrote:

> Thank you Remy and Peter for your responses. I hope you guys are doing
> good and safe in this lock down period. Could you pls. help me on my below
> queries in creating an additional dictionary.
>
>
>
> ·       How to create additional dictionary. You meant to say using the
> UMLS tool , so that using that tool we create .script files from .RRF files?
>
> ·       How can we point cTAKES application to multiple dictionaries.
> Currently only sno_rx_16ab is pointed to the application, how can I tweak
> it to point that to multiple dictionary simultaneously. Or you meant to say
> create a fresh dictionary with all the vocabularies and point just that in
> cTAKES.
>
>
>
> I hope Remy was explaining editing the existing dictionary where I would
> deal with two scenarios where one was with existing CUI and other was with
> Non-existing CUI. Could you pls. resolve the below queries regarding the
> same.
>
>
>
> ·       So for these edits I will have to add INSERT queries to
> respective tables in the sno_rx_16ab.script file right? Do I need to make
> any more changes for these tokens to get reflected in cTAKES.
>
> ·       If it is a non-existing CUI , I can get the respective CUI,TUI
> from here  https://uts.nlm.nih.gov//metathesaurus.html
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073&sdata=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D&reserved=0>
>  right?
>
> ·       Based on the source I will have to add entry to respective table
> right? Like SNOMED,RxNORM,ICD 10 and a CUI will belong to either one of it
> and not in all. Correct me if am wrong on this understanding
>
> ·       PREFTERM table will be having only one entry for each CUI right?
> Basically it’s a one-to-one mapping between CUI and PREFTERM . Correct me
> if am wrong on this understanding.
>
>
>
>
>
> Thanks & Regards
>
> [image: cid:D3145E69-CD94-48C1-877F-5134EEAFB598]
>
> *Abad Ayyub*
>
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
>
>
> *From:* Remy Sanouillet <re...@foreseemed.com>
> *Sent:* Friday, May 29, 2020 9:25 PM
> *To:* dev@ctakes.apache.org
> *Cc:* user@ctakes.apache.org
> *Subject:* Re: Building a new custom dictionary or Updating/Adding values
> to the existing dictionary in cTAKES
>
>
>
> *[External]*
>
> Hello Abad,
>
>
>
> The short answer is, yes, the sno_rx_16ab can be "hacked". A couple of
> caveats are that any mistake can stop all recognition and you will lose all
> your mods on updates. So an additional dictionary is a recommended approach.
>
>
>
> There are two cases. EIther the CUI you are adding already exists and you
> are just adding a synonym. In that case, you only need to add one line:
>
> INSERT INTO CUI_TERMS VALUES(CUI,RINDEX,TCOUNT,TEXT,RWORD)
>
> where:
>
>    - CUI is the cui, nuf'said
>    - TEXT is the tokenized lowercase string for the entry. In your case
>    'pap smear'. Most punctuation is a separate token. Single quotes are
>    escaped by doubling them
>    - RWORD is the one token in TEXT that is the most indicative (least
>    common) which will be used as the index in the lookup. In your case
>    probably 'pap' since it is not as common as 'smear'
>    - RINDEX is the index of RWORD in TEXT. First token is 0 which is the
>    case for 'pap'
>    - TCOUNT is the token count for TEXT. In your case, 2
>
> So you would want to add:
>
> INSERT INTO CUI_TERMS VALUES(200845,0,2,'pap smear','pap')
>
>
>
>  If the entry is a non-existing one, you will need to add a few more
> lines. Their positions are unimportant as long as they are below the header
> lines (below the final "SET SCHEMA PUBLIC" line).
>
>    1. INSERT INTO TUI VALUES(CUI,TUI)
>    One line for each TUI in the taxonomy
>    2. INSERT INTO SNOMEDCT_US VALUES(CUI,SNOMED)
>    assuming you are adding a SNOMED
>    3. INSERT INTO PREFTERM VALUES(CUI,PREFTERM)
>    where PREFTERM is the pretty string to describe the entry. It need not
>    correspond to any indexed entry. It is used for display once the lookup has
>    been successful.
>
> That's it. Use at your own discretion. No guarantees.
>
>
>
>
> *Rémy Sanouillet*
>
> NLP Engineer
>
> remys@foreseemed.com <xx...@foreseemed.com>
>
>
>
>
> [image: cid:347EAEF1-26E8-42CB-BAE3-6CB228301B15]
> ForeSee Medical, Inc.
>
> 12555 High Bluff Drive, Suite 100
>
> San Diego, CA 92130
>
>
>
> NOTICE: This e-mail message and all attachments transmitted with it are
> intended solely for the use of the addressee and may contain legally
> privileged and confidential information. If the reader of this message is
> not the intended recipient, or an employee or agent responsible for
> delivering this message to the intended recipient, you are hereby notified
> that any dissemination, distribution, copying, or other use of this message
> or its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to this
> message and please delete it from your computer.
>
>
>
>
>
> On Fri, May 29, 2020 at 7:34 AM <Ab...@cognizant.com> wrote:
>
> Hi Team,
>
>
>
> We set up cTAKES4.0.0 as our NLP engine for our profile recently . We have
> faced situations where some of the expected tokens are not picked up by
> cTAKES during clinical text extraction. So our first thought process was to
> identify where the dictionary is configured and how that can be updated.
> After some code analysis  it was found that the dictionary is configured in
> the  below path under ctakes/resources for sources RxNorm and SNOMEDCT_US
>
>
>
> We were able to open the hsqldb using the hsql db gui and found out that
> some of our required entries are already there . So if I come specifically
> to our current problem. The  Pap Smear and Mamogram are two clinical terms
> which are not currently recognized by cTAKES in our profile.
>
> ·       If I look into the .script file , Pap Smear and
> Mammogram/Mammography is already present in the .script file and in the
> respective tables. PFB a snapshot as below
>
>
>
>
>
>
>
>
>
> But still this was not recogonised by cTAKES. I see there are some filters
> working on top of the available entries in dictionary(ctakes-gui and
> ctake-gui-res). Will that be because of these filters the tokens are not
> recognized as expected. Could you pls. share us what exactly these filters
> do. This will help us in future also when we are trying to add new terms
> into the dictionary
>
>
>
>
>
> ·       What are the steps to do if we need to add/edit entries into the
> existing dictionaries. I see we can add/edit the existing values in
> .scripts files but  our primary doubt is if suppose I have a term ‘xyz’ to
> be added to dictionary how can I get the CUI and other values like
> TUI,RINDEX,TCOUNT and PREFTERM. Is it fine if I can give any random value
> for the TUI/CUI/RINDEX/TCOUNT. I could also see options to create custom
> bsv dictionaries but couldn’t see much documentation for it. Kindly advise
> which is the better option from the below 3.
>
>
>
> o   Generate a custom dictionary using METAMORPHOSYS UML installation
> tool(where we provide sources as ICD10,RxNORM,SNOMEDCT_US) and leverage the
> full set of .rrf  files in the meta folder . Is this approach better if the
> entries to be populated are maximal?
>
> o   Add/edit the available dictionary sno_rx_16ab and in that case how to
> provide valid values for each columns like CUI, TUI,RINDEX,TCOUNT and
> PREFTERM. If the entries to be populated are minimal is this approach would
> be better?.
>
> o   Use a custom bsv , in that case how should we add  values to custom
> bsv. Could you also provide a sample in that case.
>
>
>
> I found a Metathesaurus browser in the below url , where I can search for
> the terms and get the CUI  and the respective source like ICD/CPT/MDR. But
> still I was unable to get the other required attributes to  be populated
> like TUI,RINDEX,TCOUNT and PREFTERM. Could you pls. brief what these
> attributes signifies
>
>
>
> https://uts.nlm.nih.gov//metathesaurus.html
> <https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Futs.nlm.nih.gov%2Fmetathesaurus.html&data=02%7C01%7CAbad.Ayyub%40cognizant.com%7Cbd4a861ed0404262802e08d803e8a4b0%7Cde08c40719b9427d9fe8edf254300ca7%7C0%7C0%7C637263645022133073&sdata=KFn7yO59jEsACpgY2%2BRv2XKnzipPHgC00oSvN3R0ADI%3D&reserved=0>
>
>
>
> Kindly advise us on how to proceed on this and correct us if we went wrong
> somewhere. This would be of great help for us
>
>
>
> P.S : We comply with UMLS license
>
>
>
>
>
> Thanks & Regards
>
> *Abad Ayyub*
>
> Vnet: 406170 | Cell : +91-9447379028
>
>
>
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored. This e-mail and any files
> transmitted with it are for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. If you are not the
> intended recipient(s), please reply to the sender and destroy all copies of
> the original message. Any unauthorized review, use, disclosure,
> dissemination, forwarding, printing or copying of this email, and/or any
> action taken in reliance on the contents of this e-mail is strictly
> prohibited and may be unlawful. Where permitted by applicable law, this
> e-mail and other e-mail communications sent to and from Cognizant e-mail
> addresses may be monitored.
>