You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID> on 2023/04/12 12:11:30 UTC

Re: cTAKES questions [EXTERNAL]

Hi John,

Good questions.  Unfortunately, I can't really say what is going on as it seems that a lot of the information is in your images - 1000 words and all that.
Unfortunately, attachments and inserted images will not go through the dev@ email system.  Please copy/paste some plain text in this thread and we will try to help you.

The first "NOCODE" item might come from a table name mismatch in the database, e.g. "ICD-9" vs. "ICD_9", but that is a shot in the dark.

The second issue that you report is more concerning.  You are correct in that it is unexpected and most likely not a great thing to have happening.

Just in case it makes things easier, you can use another method for getting cuis.  For instance, add the SemanticTableFileWriter to the end of your pipeline.  It will write one file per note and accepts standard fileWriter parameter "SubDirectory", plus values for parameter "TableType": BSV, CSV, HTML, TAB.

Sean

________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Tuesday, April 11, 2023 11:45 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: cTAKES questions [EXTERNAL]

* External Email - Caution *


Hello,

I have a minor bug to report, and a question that may be a part of a major bug.



If I create a custom dictionary with multiple vocabularies and then run cTAKES using this custom dictionary, cTAKES will sometimes replace the vocabulary name with the name of the custom dictionary. An example is shown in the attached image1.png that was run on the MIMIC dataset. I noticed that if I looked up the CUI C1548802 in the UMLS Metathesaurus Browser that had the incorrect vocabulary name inserted, it had ‘NOCODE’ for the code. This only seemed to occur with CUIs from the MTH vocabulary. Is this something that can be fixed within cTAKES?



The question and maybe major bug was we ran the same dataset (50 MIMIC notes) twice: once on the custom dictionary with multiple vocabularies described in the attached image1.png, and then using a custom dictionary that only included the snomed vocabulary. Next, we filtered the output from the multiple vocabulary dictionary to only include CUIs that were reported by snomed. The two outputs from cTAKES should have produced the same CUIs, but as can be seen in the attached Venn Diagrams, some of the CUIs reported by cTAKES running the snomed-only dictionary were not reported by cTAKES running the multiple vocabulary dictionary. Do you know why the two outputs would be different?



We’re running user installation of cTAKES 4.0.0.1 via



./bin/runPiperFile.sh -p path/to/piperfile -l path/to/custom_dict.xml -i inputDir --xmiOut outputDir



And then extracting the CUIs from the output XMI files.



Please let me know if I should report this as an issue on the new GitHub repository instead of via email.



Thanks!



John Caskey



Re: cTAKES questions [EXTERNAL]

Posted by Thomas W Loehfelm <tw...@ucdavis.edu.INVALID>.
Depending on the configuration of the dictionary lookup (https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+Fast+Dictionary+Lookup), I think it is entirely possible that you will get some CUI differences under your scenario. For example, if you are:

  *   comparing dictionary A + B versus dictionary A alone, and
  *   dictionary A contains "cholecystitis": CUI001, and
  *   dictionary B contains "acute cholecystitis": CUI002, and "cholecystitis": CUI001, and
  *   your dictionary lookup is set to "Most Precise Terms Persistence", and
  *   your source document contains the phrase "acute cholecystitis", then
  *   when looking up against dictionaries A + B, you will return "CUI002" (i.e. only the more precise dictionary B term), and
  *   when looking up against dictionary A, you will return "CUI001"

There are probably other similar configuration options that can lead to similar slight differences that depend on the specific contents of the reference dictionaries and config options in ctakes, some of which might have defaults in one annotator or another.
________________________________
From: Finan, Sean <Se...@childrens.harvard.edu.INVALID>
Sent: Wednesday, April 12, 2023 5:11 AM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Re: cTAKES questions [EXTERNAL]

Hi John,

Good questions.  Unfortunately, I can't really say what is going on as it seems that a lot of the information is in your images - 1000 words and all that.
Unfortunately, attachments and inserted images will not go through the dev@ email system.  Please copy/paste some plain text in this thread and we will try to help you.

The first "NOCODE" item might come from a table name mismatch in the database, e.g. "ICD-9" vs. "ICD_9", but that is a shot in the dark.

The second issue that you report is more concerning.  You are correct in that it is unexpected and most likely not a great thing to have happening.

Just in case it makes things easier, you can use another method for getting cuis.  For instance, add the SemanticTableFileWriter to the end of your pipeline.  It will write one file per note and accepts standard fileWriter parameter "SubDirectory", plus values for parameter "TableType": BSV, CSV, HTML, TAB.

Sean

________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Tuesday, April 11, 2023 11:45 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: cTAKES questions [EXTERNAL]

* External Email - Caution *


Hello,

I have a minor bug to report, and a question that may be a part of a major bug.



If I create a custom dictionary with multiple vocabularies and then run cTAKES using this custom dictionary, cTAKES will sometimes replace the vocabulary name with the name of the custom dictionary. An example is shown in the attached image1.png that was run on the MIMIC dataset. I noticed that if I looked up the CUI C1548802 in the UMLS Metathesaurus Browser that had the incorrect vocabulary name inserted, it had ‘NOCODE’ for the code. This only seemed to occur with CUIs from the MTH vocabulary. Is this something that can be fixed within cTAKES?



The question and maybe major bug was we ran the same dataset (50 MIMIC notes) twice: once on the custom dictionary with multiple vocabularies described in the attached image1.png, and then using a custom dictionary that only included the snomed vocabulary. Next, we filtered the output from the multiple vocabulary dictionary to only include CUIs that were reported by snomed. The two outputs from cTAKES should have produced the same CUIs, but as can be seen in the attached Venn Diagrams, some of the CUIs reported by cTAKES running the snomed-only dictionary were not reported by cTAKES running the multiple vocabulary dictionary. Do you know why the two outputs would be different?



We’re running user installation of cTAKES 4.0.0.1 via



./bin/runPiperFile.sh -p path/to/piperfile -l path/to/custom_dict.xml -i inputDir --xmiOut outputDir



And then extracting the CUIs from the output XMI files.



Please let me know if I should report this as an issue on the new GitHub repository instead of via email.



Thanks!



John Caskey


**CONFIDENTIALITY NOTICE** This e-mail communication and any attachments are for the sole use of the intended recipient and may contain information that is confidential and privileged under state and federal privacy laws. If you received this e-mail in error, be aware that any unauthorized use, disclosure, copying, or distribution is strictly prohibited. If you received this e-mail in error, please contact the sender immediately and destroy/delete all copies of this message.