You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by "Lee, Richard A. [USA]" <le...@bah.com> on 2014/04/29 21:57:06 UTC

RE: [External] Re: Problems with TUI filtering and other annotation omissions

Thank you for that pointer. Unfortunately, org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation does not have the missing annotations.

I noticed that a later post to this list asked a similar question concerning adding TUIs to LookupDesc_Db.xml, and the answer was that the ctakes code in UmlsToSnomedConsumerImpl only looks for certain TUI “groups”. So that would explain why my shot-in-the-dark of using “chemicalanddrugTuis” did not work. I changed that to “medicationTuis”, as suggested by the code, which did indeed cause most of the expected additional terms to be annotated.

So that partially answers my question. The ones it still missed despite being tied to the added TUIs, and the ones not added to the annotations despite adding T058 to the existing element with group “procedureTuis”, remain mysteries…

----

From: Pei Chen [mailto:chenpei@apache.org]
Sent: Fri, 04 Apr, 2014 16:33
To: user@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: [External] Re: Problems with TUI filtering and other annotation omissions

Richard,
org.apache.ctakes.assertion.medfacts.types.Concept is an internal type used by the assertion module,
could you see what is returned in: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation?


On Fri, Apr 4, 2014 at 3:56 PM, Lee, Richard A. [USA] <le...@bah.com>> wrote:
I ran several documents through cTAKES, using AggregatePlaintextUMLSProcessor, and examined the list of org.apache.ctakes.assertion.medfacts.types.Concept annotations produced for each. From those results, I made up a list of phrases I had hoped cTAKES would annotate but did not. I used MetaMap to look up each of those phrases, and found that approximately 150 of them resulted in a full-phrase match and a corresponding CUI.

I used the MetamorphoSys scripts to load the UMLS RRF data set into a SQL DB, and queried the DB to confirm that those ~150 phrases were indeed present with the expected CUIs. So, the question becomes, why didn’t cTAKES annotate them?

Looking at the cTAKES logs, it appears the OrangeBookFilter “Filtered out” only 5 out of the 150.

The other possible cause I could think of was the TUI filtering; there was no evidence of it in the logs, but I don’t know whether the results of filtering in that step get logged by default or not. I looked up in the DB the TUIs for each of the phrases, compared them to the lists of “allowed” TUIs in LookupDesc_Db.xml, and concluded that the TUI filtering might account for 44 of the phrases. So the rest remain a mystery.

I modified the TUI lists in LookupDesc_Db.xml to add TUIs, in the hopes that that would cause the corresponding phrases to be annotated. Specifically, I added T058 to one list, and added a second list with a handful of TUIs:

<property key="procedureTuis" value="T058,T059,T060,T061"/>
<property key="chemicalanddrugTuis" value="T109,T110,T116,T121,T123"/>

T058 corresponded to 3 of the phrases on my list; T121 alone accounted for 24 of them. But, upon restarting cTAKES with that modified file, and running relevant documents, I found that the expected phrases were still not annotated. I even tried making the same change in LookupDesc.xml just in case, to no avail.

So, the questions are:

- Are there reasons beyond the OrangeBook and TUI filters why CUI-associated phrases in UMLS would not get annotated?

- Do TUI-filter results get logged by default, and if not, is there a way (log4j settings?) to log them without making code changes?

- Am I doing the TUI filter changes wrong?

Thanks for any answers and advice.