You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by Michael Trepanier <mi...@metistream.com> on 2018/02/13 19:42:03 UTC

GetOriginalText() Always Returning NULL for Identified Annotations

Hi,

I am attempting to run the default FastPipeline to extract various features
from clinical text. One of the features I'd like to capture is the covered
text. However, when running the below scala code, calling getOriginalText
yields a "null" value for every annotation of type IdentifiedAnnotation. Is
this by design?

And if so, what would be a better way to extract the covered text? The
other features I need (subject, polarity, confidence, historyOf, and
snomed/CUI/TUI/PreferredText) I can acquire just fine. Effectively, the
goal here is to capture every identified annotation, relevant metadata, and
the original text (only showing my attempt at getting the covered text
below).

def main(args: Array[String]) {
    val note =
      """
       ... (Some long  example note.)
      """.stripMargin
    val aed = ClinicalPipelineFactory.getDefaultPipeline
    val ae = AnalysisEngineFactory.createEngine(aed)
    val jcas =

JCasFactory.createJCas("org.apache.ctakes.typesystem.types.TypeSystem")
    jcas.setDocumentText(note)
    ae.process(jcas)
    val index = jcas.getAnnotationIndex(IdentifiedAnnotation.`type`)
    val iter = index.iterator()
    while (iter.hasNext) {
      val annotation = iter.next().asInstanceOf[IdentifiedAnnotation]
      val fsArray = annotation.getOriginalText()
      if (fsArray != null) {
        for (featureStructure <- fsArray.toArray()) {
          val featureArray = featureStructure.getType().getFeatures()
          val strings = featureArray.map(x =>
featureStructure.getStringValue(x))
          println(strings)
        }
      }
    }
  }


Regards,

Mike Trepanier
-- 
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  mike@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com

Re: GetOriginalText() Always Returning NULL for Identified Annotations

Posted by Michael Trepanier <mi...@metistream.com>.
Thanks Jessica - that does exactly what I need.

On Tue, Feb 13, 2018 at 1:49 PM, Jessica Glover <gl...@gmail.com>
wrote:

> Hi Mike,
>
> Have you tried the getCoveredText() method that IdentifiedAnnotation
> inherits from Annotation?
>
> - Jessica
>
> On Tue, Feb 13, 2018 at 2:42 PM, Michael Trepanier <mi...@metistream.com>
> wrote:
>
>> Hi,
>>
>> I am attempting to run the default FastPipeline to extract various
>> features from clinical text. One of the features I'd like to capture is the
>> covered text. However, when running the below scala code, calling
>> getOriginalText yields a "null" value for every annotation of type
>> IdentifiedAnnotation. Is this by design?
>>
>> And if so, what would be a better way to extract the covered text? The
>> other features I need (subject, polarity, confidence, historyOf, and
>> snomed/CUI/TUI/PreferredText) I can acquire just fine. Effectively, the
>> goal here is to capture every identified annotation, relevant metadata, and
>> the original text (only showing my attempt at getting the covered text
>> below).
>>
>> def main(args: Array[String]) {
>>     val note =
>>       """
>>        ... (Some long  example note.)
>>       """.stripMargin
>>     val aed = ClinicalPipelineFactory.getDefaultPipeline
>>     val ae = AnalysisEngineFactory.createEngine(aed)
>>     val jcas =
>>       JCasFactory.createJCas("org.apache.ctakes.typesystem.types.
>> TypeSystem")
>>     jcas.setDocumentText(note)
>>     ae.process(jcas)
>>     val index = jcas.getAnnotationIndex(IdentifiedAnnotation.`type`)
>>     val iter = index.iterator()
>>     while (iter.hasNext) {
>>       val annotation = iter.next().asInstanceOf[IdentifiedAnnotation]
>>       val fsArray = annotation.getOriginalText()
>>       if (fsArray != null) {
>>         for (featureStructure <- fsArray.toArray()) {
>>           val featureArray = featureStructure.getType().getFeatures()
>>           val strings = featureArray.map(x =>
>> featureStructure.getStringValue(x))
>>           println(strings)
>>         }
>>       }
>>     }
>>   }
>>
>>
>> Regards,
>>
>> Mike Trepanier
>> --
>> [image: MetiStream Logo - 500]
>> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
>> mike@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
>> www.metistream.com
>>
>
>


-- 
[image: MetiStream Logo - 500]
Mike Trepanier| Big Data Engineer | MetiStream, Inc. |  mike@metistream.com |
845 - 270 - 3129 (m) | www.metistream.com

Re: GetOriginalText() Always Returning NULL for Identified Annotations

Posted by Jessica Glover <gl...@gmail.com>.
Hi Mike,

Have you tried the getCoveredText() method that IdentifiedAnnotation
inherits from Annotation?

- Jessica

On Tue, Feb 13, 2018 at 2:42 PM, Michael Trepanier <mi...@metistream.com>
wrote:

> Hi,
>
> I am attempting to run the default FastPipeline to extract various
> features from clinical text. One of the features I'd like to capture is the
> covered text. However, when running the below scala code, calling
> getOriginalText yields a "null" value for every annotation of type
> IdentifiedAnnotation. Is this by design?
>
> And if so, what would be a better way to extract the covered text? The
> other features I need (subject, polarity, confidence, historyOf, and
> snomed/CUI/TUI/PreferredText) I can acquire just fine. Effectively, the
> goal here is to capture every identified annotation, relevant metadata, and
> the original text (only showing my attempt at getting the covered text
> below).
>
> def main(args: Array[String]) {
>     val note =
>       """
>        ... (Some long  example note.)
>       """.stripMargin
>     val aed = ClinicalPipelineFactory.getDefaultPipeline
>     val ae = AnalysisEngineFactory.createEngine(aed)
>     val jcas =
>       JCasFactory.createJCas("org.apache.ctakes.typesystem.
> types.TypeSystem")
>     jcas.setDocumentText(note)
>     ae.process(jcas)
>     val index = jcas.getAnnotationIndex(IdentifiedAnnotation.`type`)
>     val iter = index.iterator()
>     while (iter.hasNext) {
>       val annotation = iter.next().asInstanceOf[IdentifiedAnnotation]
>       val fsArray = annotation.getOriginalText()
>       if (fsArray != null) {
>         for (featureStructure <- fsArray.toArray()) {
>           val featureArray = featureStructure.getType().getFeatures()
>           val strings = featureArray.map(x => featureStructure.
> getStringValue(x))
>           println(strings)
>         }
>       }
>     }
>   }
>
>
> Regards,
>
> Mike Trepanier
> --
> [image: MetiStream Logo - 500]
> Mike Trepanier| Big Data Engineer | MetiStream, Inc. |
> mike@metistream.com | 845 - 270 - 3129 <(845)%20270-3129> (m) |
> www.metistream.com
>