You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2018/12/31 21:10:43 UTC

Re: SemanticCleanupTermConsumer [EXTERNAL] [SUSPICIOUS]

Hi Tim,

SemanticCleanupTermConsumer does something slightly different than what you are asking.

Like PrecisionTermConsumer, longer terms of a semantic group subsume fully- enclosed terms of the same semantic group.  Overlapping terms are not subsumed, just those that are fully covered.

SemanticCleanupTermConsumer performs the additional step of allowing Disease/Disorders to subsume enclosed Sign/Symptoms.  So, for the given text
"metastatic cancer":
- PrecisionTermConsumer would report "metastatic" as Finding, T169  and "metastatic cancer" as Disorder [Metastatic Neoplasm], T191.
- SemanticCleanupTermConsumer would only report "metastatic cancer" as Disorder [Metastatic Neoplasm], T191.

My apologies: All of the class names in that module were made way back when I thought that ctakes (and the older dl module) had a rigid naming scheme.  Hence the long and not very self-descriptive names.

Do you just want only "longest-covering" annotations regardless of semantic type?  That should be easy post- dictionary lookup.  But if your want to use a consumer in the dl sub-pipeline, then the simplest thing to do would be:
1.  Make a copy of DefaultTermConsumer
2.  In that class, override consumeHits(..)
3.  In the override, basically copy what is in AbstractTermConsumer but don't loop through semantic types - just lump all annotations in a single Map and call consumeTypeIdHits(..)

Sean



________________________________________
From: Miller, Timothy <Ti...@childrens.harvard.edu>
Sent: Monday, December 31, 2018 2:42 PM
To: dev@ctakes.apache.org
Subject: SemanticCleanupTermConsumer [EXTERNAL] [SUSPICIOUS]

Sean (and team),
I was using PrecisionTermConsumer for my ctakes-web-rest implementation hoping to avoid any overlaps at all, but when I saw some overlaps I noticed the comment:
PrecisionTermConsumer will only persist only the longest overlapping span of any semantic group.

So with this term consumer, "colon cancer" goes from 3 spans (colon, cancer, colon cancer) to 2 (colon, colon cancer) since cancer and colon cancer have the same semantic group. But if I want it to go to 1 (colon cancer), is that what SemanticCleanupTermConsumer does?

Tim