You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Lani Uni <li...@informatik.uni-leipzig.de> on 2017/10/26 09:22:19 UTC

Can cTAKES output AUI?

Hello all,

I am using UMLS to annotate medical forms and would like to know which terminology (SAB in MRCONSO) is used by cTAKES for the generated annotation.
Is there a way to get this information (something like AUI)? 
I see in OntologyConcept one can get terminology using getCodingScheme, but this only gives me back info about UMLS.
On the other hand, getPreferredText in UmlsConcept seems to give me the PN or PT of an concept but not the actual term used for annotation generation.
Any idea to help me a step further?

best,
Lani

RE: Can cTAKES output AUI? [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Lani,

I am going to try to address your questions, but I'm not sure that this is the information that you want.

- The ctakes 4.0 dictionary (sno_rx_16ab) is built using terms from the snomed ct and rxnorm terminologies as they exist in the 2016AB umls release.

- The codes from the two terminologies plus the terminology-neutral umls concept unique identifier (cui) are stored in the dictionary.

- Custom dictionaries can be created using different terminologies.

- There are a lot of get..(..) methods in the OntologyConceptUtil class in ctakes-core.  http://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html
For instance, you can get the terminology/scheme names and codes for an annotation:
Map<String,Collection<String>>     schemeAndItsCodes     =    OntologyConceptUtil.getSchemeCodes( annotation );

- I am not sure that I am correctly interpreting your question, but you can get the 'actual term' used for annotation generation by calling .getCoveredText() on the annotation.

- The 'covered text' may not be the 'actual term' if:
   - lexical variants were used.  E.g. Terminology: "bitten by dog" ; covered text "bitten by dogs".
   - The overlap annotator was used.  Terminology: "bitten by dog" ; covered text "bitten by angry dog".

- The aui is not stored.  There are simply too many of them (every row in mrconso) since they are unique for combinations of text and source terminology plus each preferred term ...
sno_rx_16ab has for "dog bite" one preferred term "Dog Bite" and 2 snomedct codes.  
Mrconso has 4 aui for snomedct in mrconso because of code type combinations.  Add in the 5 other terminologies and "dog bite" ends up with 10 aui.

Let me know if you need something else.  It might help me if I know your goal.

Sean

-----Original Message-----
From: Lani Uni [mailto:lin@informatik.uni-leipzig.de] 
Sent: Thursday, October 26, 2017 5:22 AM
To: dev@ctakes.apache.org
Subject: Can cTAKES output AUI? [EXTERNAL]

Hello all,

I am using UMLS to annotate medical forms and would like to know which terminology (SAB in MRCONSO) is used by cTAKES for the generated annotation.
Is there a way to get this information (something like AUI)? 
I see in OntologyConcept one can get terminology using getCodingScheme, but this only gives me back info about UMLS.
On the other hand, getPreferredText in UmlsConcept seems to give me the PN or PT of an concept but not the actual term used for annotation generation.
Any idea to help me a step further?

best,
Lani