You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID> on 2023/02/02 19:53:29 UTC

Re: CUI Question [EXTERNAL]

Hi John,

Each annotation gets a unique concept for every combination of possible codes, semantic types, etc.
You have pasted a good example of when that happens:  (abbreviated)

< code="7092007" tui="T109"/>
<code="7092007" tui="T121"/>
<code="372826007"  tui="T109"/>
<code="372826007"  tui="T121"/>

This is definitely a little confusing when the CUI for all 4 'unique' concepts is the same, in your case cui="C0025859".

If you are interested in gathering annotations, cuis, codes, concepts, semantic types etc. you should consider using the OntologyConceptUtil in ctakes-core.
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html

As far as I can tell, methods with application to your question would be:

getAnnotationsByCui( jCas, "C0025859" )
  --> which would return 3 annotations given your example.

getCuiCounts(  jCas )
  --> which would return a Map<String,Long> where  the cui is the key (String) and the # of annotations with that cui is the value (Long).  In your case this should be "C0025859", 3.

There are around 35 methods, so hopefully you can find some that fit your needs.

In case you really need something special, parsing the xmi files is probably not the best way to get information.


Sean


________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Thursday, February 2, 2023 1:58 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: CUI Question [EXTERNAL]

* External Email - Caution *


Hello,
I’ve run into a problem and a question when running cTAKES. If I have a document and process it through cTAKES, then the XMI output will contain numerous XML tags. The tags our lab is interested in are the CUIs, for example, the XMI tag

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>

Would indicate the CUI C0025859 for Metoprolol-containing product is found in a given document.

If I look at the input document text, then I can locate three instances of the drug Metoprolol in the document text. When I look at the cTAKES XMI output in the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of ontologyConceptArr, with 4 members each, looking like this:

// found at org.apache.ctakes.typesystem.type.textsem.EventMention
//       org.apache.ctakes.typesystem.type.textsem.MedicationMention
//           ontologyConceptArr = uima.cas.FSArray[4]

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16646" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16616" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16636" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>

Although not shown here, it is possible for there to be different CUIs within a single uima.cas.FSArray, with this array mapping to a single string of text in the document.

If I walk the XMI file and retrieve all CUIs, then the result will be the CUI C0025859 being found 12 times, however, if I extend the JCasAnnotator_ImplBase java class to extract the CUIs from the jCas annotations, then it only finds this CUI 3 times.

If part of the output needs to include a count of all CUIs found by cTAKES within a given document, which method is correct?

Thanks!


John Caskey, PhD
Senior Data Scientist
Department of Medicine
University of Wisconsin-Madison



Re: CUI Question [EXTERNAL]

Posted by JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>.
Awesome, thank you very much!

From: Finan, Sean <Se...@childrens.harvard.edu.INVALID>
Date: Thursday, February 2, 2023 at 1:53 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Re: CUI Question [EXTERNAL]
Hi John,

Each annotation gets a unique concept for every combination of possible codes, semantic types, etc.
You have pasted a good example of when that happens:  (abbreviated)

< code="7092007" tui="T109"/>
<code="7092007" tui="T121"/>
<code="372826007"  tui="T109"/>
<code="372826007"  tui="T121"/>

This is definitely a little confusing when the CUI for all 4 'unique' concepts is the same, in your case cui="C0025859".

If you are interested in gathering annotations, cuis, codes, concepts, semantic types etc. you should consider using the OntologyConceptUtil in ctakes-core.
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html

As far as I can tell, methods with application to your question would be:

getAnnotationsByCui( jCas, "C0025859" )
  --> which would return 3 annotations given your example.

getCuiCounts(  jCas )
  --> which would return a Map<String,Long> where  the cui is the key (String) and the # of annotations with that cui is the value (Long).  In your case this should be "C0025859", 3.

There are around 35 methods, so hopefully you can find some that fit your needs.

In case you really need something special, parsing the xmi files is probably not the best way to get information.


Sean


________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Thursday, February 2, 2023 1:58 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: CUI Question [EXTERNAL]

* External Email - Caution *


Hello,
I’ve run into a problem and a question when running cTAKES. If I have a document and process it through cTAKES, then the XMI output will contain numerous XML tags. The tags our lab is interested in are the CUIs, for example, the XMI tag

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>

Would indicate the CUI C0025859 for Metoprolol-containing product is found in a given document.

If I look at the input document text, then I can locate three instances of the drug Metoprolol in the document text. When I look at the cTAKES XMI output in the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of ontologyConceptArr, with 4 members each, looking like this:

// found at org.apache.ctakes.typesystem.type.textsem.EventMention
//       org.apache.ctakes.typesystem.type.textsem.MedicationMention
//           ontologyConceptArr = uima.cas.FSArray[4]

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16646" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16616" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16636" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>

Although not shown here, it is possible for there to be different CUIs within a single uima.cas.FSArray, with this array mapping to a single string of text in the document.

If I walk the XMI file and retrieve all CUIs, then the result will be the CUI C0025859 being found 12 times, however, if I extend the JCasAnnotator_ImplBase java class to extract the CUIs from the jCas annotations, then it only finds this CUI 3 times.

If part of the output needs to include a count of all CUIs found by cTAKES within a given document, which method is correct?

Thanks!


John Caskey, PhD
Senior Data Scientist
Department of Medicine
University of Wisconsin-Madison


Re: CUI Question [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi John,

To start, I would use XmiTreeReader to read in the XMI file(s).  It can read a single file or multiple files in a directory tree.
In your piper file, just add the line:  "reader XmiTreeReader"

XmiTreeReader will automatically store metadata like doc ID, and file path.

The file path is stored in the cas as a DocumentPath type.  To get it, you can use code like the following (stolen from AbstractFileWriter):

   /**
    * @param jCas ye olde
    * @return the full path to the file containing the processed text, or an empty string ("") if unknown
    */
   protected String getSourceFilePath( final JCas jCas ) {
      final Collection<DocumentPath> documentPaths = JCasUtil.select( jCas, DocumentPath.class );
      if ( documentPaths == null || documentPaths.isEmpty() ) {
         return "";
      }
      for ( DocumentPath documentPath : documentPaths ) {
         final String path = documentPath.getDocumentPath();
         if ( path != null && !path.isEmpty() ) {
            return path;
         }
      }
      return "";
   }


Sean

________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Thursday, February 2, 2023 4:44 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Re: CUI Question [EXTERNAL]

* External Email - Caution *


Hi Sean,
I’ve got one additional question, sorry if it’s a naive one. If I send cTAKES an XMI file as input, is the file name stored anywhere in the jCas object that is created?

The problem I’m running into is I’m trying to retrieve the file name via

File noteFile = new File(ViewUriUtil.getURI(jCas).toString());

But this throws

org.apache.uima.cas.CASRuntimeException: No sofaFS with name UriView found

Thanks again,
John

From: Finan, Sean <Se...@childrens.harvard.edu.INVALID>
Date: Thursday, February 2, 2023 at 1:53 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Re: CUI Question [EXTERNAL]
Hi John,

Each annotation gets a unique concept for every combination of possible codes, semantic types, etc.
You have pasted a good example of when that happens:  (abbreviated)

< code="7092007" tui="T109"/>
<code="7092007" tui="T121"/>
<code="372826007"  tui="T109"/>
<code="372826007"  tui="T121"/>

This is definitely a little confusing when the CUI for all 4 'unique' concepts is the same, in your case cui="C0025859".

If you are interested in gathering annotations, cuis, codes, concepts, semantic types etc. you should consider using the OntologyConceptUtil in ctakes-core.
https://urldefense.com/v3/__https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html__;!!NZvER7FxgEiBAiR_!tjuED-Hsg9fE1kN3Kus2co4068e3cKGwl93r8CU1QdBeosw_84utLY8-M2xLRWSuHm3k1dc-jYSxY2WFGJJPTFqaCnoJNsT8UJzW1t2yHIc$

As far as I can tell, methods with application to your question would be:

getAnnotationsByCui( jCas, "C0025859" )
  --> which would return 3 annotations given your example.

getCuiCounts(  jCas )
  --> which would return a Map<String,Long> where  the cui is the key (String) and the # of annotations with that cui is the value (Long).  In your case this should be "C0025859", 3.

There are around 35 methods, so hopefully you can find some that fit your needs.

In case you really need something special, parsing the xmi files is probably not the best way to get information.


Sean


________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Thursday, February 2, 2023 1:58 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: CUI Question [EXTERNAL]

* External Email - Caution *


Hello,
I’ve run into a problem and a question when running cTAKES. If I have a document and process it through cTAKES, then the XMI output will contain numerous XML tags. The tags our lab is interested in are the CUIs, for example, the XMI tag

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>

Would indicate the CUI C0025859 for Metoprolol-containing product is found in a given document.

If I look at the input document text, then I can locate three instances of the drug Metoprolol in the document text. When I look at the cTAKES XMI output in the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of ontologyConceptArr, with 4 members each, looking like this:

// found at org.apache.ctakes.typesystem.type.textsem.EventMention
//       org.apache.ctakes.typesystem.type.textsem.MedicationMention
//           ontologyConceptArr = uima.cas.FSArray[4]

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16646" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16616" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16636" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>

Although not shown here, it is possible for there to be different CUIs within a single uima.cas.FSArray, with this array mapping to a single string of text in the document.

If I walk the XMI file and retrieve all CUIs, then the result will be the CUI C0025859 being found 12 times, however, if I extend the JCasAnnotator_ImplBase java class to extract the CUIs from the jCas annotations, then it only finds this CUI 3 times.

If part of the output needs to include a count of all CUIs found by cTAKES within a given document, which method is correct?

Thanks!


John Caskey, PhD
Senior Data Scientist
Department of Medicine
University of Wisconsin-Madison


Re: CUI Question [EXTERNAL]

Posted by JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>.
Hi Sean,
I’ve got one additional question, sorry if it’s a naive one. If I send cTAKES an XMI file as input, is the file name stored anywhere in the jCas object that is created?

The problem I’m running into is I’m trying to retrieve the file name via

File noteFile = new File(ViewUriUtil.getURI(jCas).toString());

But this throws

org.apache.uima.cas.CASRuntimeException: No sofaFS with name UriView found

Thanks again,
John

From: Finan, Sean <Se...@childrens.harvard.edu.INVALID>
Date: Thursday, February 2, 2023 at 1:53 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Re: CUI Question [EXTERNAL]
Hi John,

Each annotation gets a unique concept for every combination of possible codes, semantic types, etc.
You have pasted a good example of when that happens:  (abbreviated)

< code="7092007" tui="T109"/>
<code="7092007" tui="T121"/>
<code="372826007"  tui="T109"/>
<code="372826007"  tui="T121"/>

This is definitely a little confusing when the CUI for all 4 'unique' concepts is the same, in your case cui="C0025859".

If you are interested in gathering annotations, cuis, codes, concepts, semantic types etc. you should consider using the OntologyConceptUtil in ctakes-core.
https://ctakes.apache.org/apidocs/4.0.0/org/apache/ctakes/core/util/OntologyConceptUtil.html

As far as I can tell, methods with application to your question would be:

getAnnotationsByCui( jCas, "C0025859" )
  --> which would return 3 annotations given your example.

getCuiCounts(  jCas )
  --> which would return a Map<String,Long> where  the cui is the key (String) and the # of annotations with that cui is the value (Long).  In your case this should be "C0025859", 3.

There are around 35 methods, so hopefully you can find some that fit your needs.

In case you really need something special, parsing the xmi files is probably not the best way to get information.


Sean


________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Thursday, February 2, 2023 1:58 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: CUI Question [EXTERNAL]

* External Email - Caution *


Hello,
I’ve run into a problem and a question when running cTAKES. If I have a document and process it through cTAKES, then the XMI output will contain numerous XML tags. The tags our lab is interested in are the CUIs, for example, the XMI tag

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>

Would indicate the CUI C0025859 for Metoprolol-containing product is found in a given document.

If I look at the input document text, then I can locate three instances of the drug Metoprolol in the document text. When I look at the cTAKES XMI output in the cTAKES XMI CVD viewer, each of the results for Metoprolol is part of ontologyConceptArr, with 4 members each, looking like this:

// found at org.apache.ctakes.typesystem.type.textsem.EventMention
//       org.apache.ctakes.typesystem.type.textsem.MedicationMention
//           ontologyConceptArr = uima.cas.FSArray[4]

<refsem:UmlsConcept xmi:id="16626" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16646" codingScheme="SNOMEDCT_US" code="7092007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16616" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T109" preferredText="Metoprolol-containing product"/>
<refsem:UmlsConcept xmi:id="16636" codingScheme="SNOMEDCT_US" code="372826007" score="0.0" disambiguated="false" cui="C0025859" tui="T121" preferredText="Metoprolol-containing product"/>

Although not shown here, it is possible for there to be different CUIs within a single uima.cas.FSArray, with this array mapping to a single string of text in the document.

If I walk the XMI file and retrieve all CUIs, then the result will be the CUI C0025859 being found 12 times, however, if I extend the JCasAnnotator_ImplBase java class to extract the CUIs from the jCas annotations, then it only finds this CUI 3 times.

If part of the output needs to include a count of all CUIs found by cTAKES within a given document, which method is correct?

Thanks!


John Caskey, PhD
Senior Data Scientist
Department of Medicine
University of Wisconsin-Madison