You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Tomasz Oliwa <ol...@uchicago.edu> on 2015/11/19 18:07:39 UTC
TermConsumers
Hi,
How can I run a different TermConsumer on already generated CAS files?
I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml.
Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic.
Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk?
Or is there a different way to go on about this?
Thanks for any help,
Tomasz
Re: TermConsumers
Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
I believe most of the Xmi/Xcas reader classes are just wrappers for UIMA
utilities; look at XCASDeserializer's static method deserialize:
https://uima.apache.org/d/uimaj-2.6.0/apidocs/
Tim
On 11/19/2015 06:48 PM, Tomasz Oliwa wrote:
> Sean,
>
> I tested this, the Annotator itself works, great. The only change I had to do when writing the Annotator class with the code below is to provide generics in:
>
> static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList(
> MedicationMention.class, DiseaseDisorderMention.class,
> SignSymptomMention.class, LabMention.class, ProcedureMention.class );
>
> At least on a small example XMI CAS I see the behavior is as expected for the IdentifiedAnnotations.
>
> However, for my usecase, I have XCAS files, not XMI CAS files. I can use XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS Collection Reader to initially read them in.
>
> Is such a reader available?
>
> Regards,
> Tomasz
>
>
> ________________________________________
> From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
> Sent: Thursday, November 19, 2015 4:03 PM
> To: dev@ctakes.apache.org
> Subject: RE: TermConsumers
>
> Hi Tomasz,
>
> I don't know that anybody has done this. However, you could try running a pipeline with items in ctakes-core:
> XmiCollectionReaderCtakes to read your existing cas xmi files in directory
> -- custom refiner AE below -- to remove unwanted umls annotations
> XmiWriterCasConsumerCtakes to write the new cas xmi files
>
>
> The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does, but over a pre-populated cas. This is mostly cut and paste from other code with a little bit of lookompiling - I haven't tested it at all! If you do give it a run-through and it works then let me know and I'll clean it up and check into sandbox.
>
>
> static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.asList(
> MedicationMention.class, DiseaseDisorderMention.class,
> SignSymptomMention.class, LabMention.class, ProcedureMention.class );
> // Don't forget AnatomicalSiteMention.class and generic EntityMention.class!
>
> static private final Function<Annotation,TextSpan> createTextSpan
> = annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd() );
>
> static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf = annotation -> annotation;
>
> @Override
> public void process( final JCas jcas ) throws AnalysisEngineProcessException {
> LOGGER.info( "Starting processing" );
> for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
> refineForClass( jcas, eventClass );
> }
> final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class );
> final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select( jcas, EntityMention.class ) );
> entityMentions.removeAll( anatomicals );
> refineForAnnotations( jcas, anatomicals );
> refineForAnnotations( jcas, entityMentions );
> LOGGER.info( "Finished processing" );
> }
>
> static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas,
> final Class<T> eventClass ) {
> refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
> }
>
> static private <T extends IdentifiedAnnotation> void refineForAnnotations( final JCas jcas,
> final Collection<T> annotations ) {
> final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
> = annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf ) );
> final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet() );
> unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes( jcas ) );
> }
>
> static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan> originalTextSpans ) {
> final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
> final Collection<TextSpan> discardSpans = new HashSet<>();
> final int count = textSpans.size();
> for ( int i = 0; i < count; i++ ) {
> final TextSpan spanKeyI = textSpans.get( i );
> for ( int j = i + 1; j < count; j++ ) {
> final TextSpan spanKeyJ = textSpans.get( j );
> if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd() > spanKeyI.getEnd())
> || (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
> // J contains I, discard less precise concepts for span I and move on to next span I
> discardSpans.add( spanKeyI );
> break;
> }
> if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd() > spanKeyJ.getEnd())
> || (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
> // I contains J, discard less precise concepts for span J and move on to next span J
> discardSpans.add( spanKeyJ );
> }
> }
> }
> return discardSpans;
> }
>
>
> Good luck,
> Sean
>
>
> -----Original Message-----
> From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
> Sent: Thursday, November 19, 2015 12:08 PM
> To: dev@ctakes.apache.org
> Subject: TermConsumers
>
> Hi,
>
> How can I run a different TermConsumer on already generated CAS files?
>
> I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml.
>
> Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic.
>
> Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk?
>
> Or is there a different way to go on about this?
>
> Thanks for any help,
> Tomasz
>
RE: TermConsumers
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Holy cattle, it worked ?!?
I don't know of a specific xcas reader offhand ... have you tried running with the xmi reader? Some of the reads laying around will handle both.
-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 19, 2015 6:48 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers
Sean,
I tested this, the Annotator itself works, great. The only change I had to do when writing the Annotator class with the code below is to provide generics in:
static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
At least on a small example XMI CAS I see the behavior is as expected for the IdentifiedAnnotations.
However, for my usecase, I have XCAS files, not XMI CAS files. I can use XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS Collection Reader to initially read them in.
Is such a reader available?
Regards,
Tomasz
________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Thursday, November 19, 2015 4:03 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers
Hi Tomasz,
I don't know that anybody has done this. However, you could try running a pipeline with items in ctakes-core:
XmiCollectionReaderCtakes to read your existing cas xmi files in directory
-- custom refiner AE below -- to remove unwanted umls annotations
XmiWriterCasConsumerCtakes to write the new cas xmi files
The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does, but over a pre-populated cas. This is mostly cut and paste from other code with a little bit of lookompiling - I haven't tested it at all! If you do give it a run-through and it works then let me know and I'll clean it up and check into sandbox.
static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
// Don't forget AnatomicalSiteMention.class and generic EntityMention.class!
static private final Function<Annotation,TextSpan> createTextSpan
= annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd() );
static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf = annotation -> annotation;
@Override
public void process( final JCas jcas ) throws AnalysisEngineProcessException {
LOGGER.info( "Starting processing" );
for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
refineForClass( jcas, eventClass );
}
final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class );
final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select( jcas, EntityMention.class ) );
entityMentions.removeAll( anatomicals );
refineForAnnotations( jcas, anatomicals );
refineForAnnotations( jcas, entityMentions );
LOGGER.info( "Finished processing" );
}
static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas,
final Class<T> eventClass ) {
refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
}
static private <T extends IdentifiedAnnotation> void refineForAnnotations( final JCas jcas,
final Collection<T> annotations ) {
final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
= annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf ) );
final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet() );
unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes( jcas ) );
}
static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan> originalTextSpans ) {
final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
final Collection<TextSpan> discardSpans = new HashSet<>();
final int count = textSpans.size();
for ( int i = 0; i < count; i++ ) {
final TextSpan spanKeyI = textSpans.get( i );
for ( int j = i + 1; j < count; j++ ) {
final TextSpan spanKeyJ = textSpans.get( j );
if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd() > spanKeyI.getEnd())
|| (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
// J contains I, discard less precise concepts for span I and move on to next span I
discardSpans.add( spanKeyI );
break;
}
if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd() > spanKeyJ.getEnd())
|| (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
// I contains J, discard less precise concepts for span J and move on to next span J
discardSpans.add( spanKeyJ );
}
}
}
return discardSpans;
}
Good luck,
Sean
-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 19, 2015 12:08 PM
To: dev@ctakes.apache.org
Subject: TermConsumers
Hi,
How can I run a different TermConsumer on already generated CAS files?
I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml.
Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic.
Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk?
Or is there a different way to go on about this?
Thanks for any help,
Tomasz
RE: TermConsumers
Posted by Tomasz Oliwa <ol...@uchicago.edu>.
Sean,
I tested this, the Annotator itself works, great. The only change I had to do when writing the Annotator class with the code below is to provide generics in:
static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.<Class<? extends IdentifiedAnnotation>>asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
At least on a small example XMI CAS I see the behavior is as expected for the IdentifiedAnnotations.
However, for my usecase, I have XCAS files, not XMI CAS files. I can use XCasWriterCasConsumer to write the CAS files, but I cannot find any XCAS Collection Reader to initially read them in.
Is such a reader available?
Regards,
Tomasz
________________________________________
From: Finan, Sean [Sean.Finan@childrens.harvard.edu]
Sent: Thursday, November 19, 2015 4:03 PM
To: dev@ctakes.apache.org
Subject: RE: TermConsumers
Hi Tomasz,
I don't know that anybody has done this. However, you could try running a pipeline with items in ctakes-core:
XmiCollectionReaderCtakes to read your existing cas xmi files in directory
-- custom refiner AE below -- to remove unwanted umls annotations
XmiWriterCasConsumerCtakes to write the new cas xmi files
The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does, but over a pre-populated cas. This is mostly cut and paste from other code with a little bit of lookompiling - I haven't tested it at all! If you do give it a run-through and it works then let me know and I'll clean it up and check into sandbox.
static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
// Don't forget AnatomicalSiteMention.class and generic EntityMention.class!
static private final Function<Annotation,TextSpan> createTextSpan
= annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd() );
static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf = annotation -> annotation;
@Override
public void process( final JCas jcas ) throws AnalysisEngineProcessException {
LOGGER.info( "Starting processing" );
for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
refineForClass( jcas, eventClass );
}
final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class );
final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select( jcas, EntityMention.class ) );
entityMentions.removeAll( anatomicals );
refineForAnnotations( jcas, anatomicals );
refineForAnnotations( jcas, entityMentions );
LOGGER.info( "Finished processing" );
}
static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas,
final Class<T> eventClass ) {
refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
}
static private <T extends IdentifiedAnnotation> void refineForAnnotations( final JCas jcas,
final Collection<T> annotations ) {
final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
= annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf ) );
final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet() );
unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes( jcas ) );
}
static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan> originalTextSpans ) {
final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
final Collection<TextSpan> discardSpans = new HashSet<>();
final int count = textSpans.size();
for ( int i = 0; i < count; i++ ) {
final TextSpan spanKeyI = textSpans.get( i );
for ( int j = i + 1; j < count; j++ ) {
final TextSpan spanKeyJ = textSpans.get( j );
if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd() > spanKeyI.getEnd())
|| (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
// J contains I, discard less precise concepts for span I and move on to next span I
discardSpans.add( spanKeyI );
break;
}
if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd() > spanKeyJ.getEnd())
|| (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
// I contains J, discard less precise concepts for span J and move on to next span J
discardSpans.add( spanKeyJ );
}
}
}
return discardSpans;
}
Good luck,
Sean
-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 19, 2015 12:08 PM
To: dev@ctakes.apache.org
Subject: TermConsumers
Hi,
How can I run a different TermConsumer on already generated CAS files?
I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml.
Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic.
Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk?
Or is there a different way to go on about this?
Thanks for any help,
Tomasz
RE: TermConsumers
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Tomasz,
I don't know that anybody has done this. However, you could try running a pipeline with items in ctakes-core:
XmiCollectionReaderCtakes to read your existing cas xmi files in directory
-- custom refiner AE below -- to remove unwanted umls annotations
XmiWriterCasConsumerCtakes to write the new cas xmi files
The refiner AE would basically do what the PrecisionTermConsumer of the fast lookup does, but over a pre-populated cas. This is mostly cut and paste from other code with a little bit of lookompiling - I haven't tested it at all! If you do give it a run-through and it works then let me know and I'll clean it up and check into sandbox.
static private final Collection<Class<? extends IdentifiedAnnotation>> EVENT_CLASSES = Arrays.asList(
MedicationMention.class, DiseaseDisorderMention.class,
SignSymptomMention.class, LabMention.class, ProcedureMention.class );
// Don't forget AnatomicalSiteMention.class and generic EntityMention.class!
static private final Function<Annotation,TextSpan> createTextSpan
= annotation -> new DefaultTextSpan( annotation.getBegin(), annotation.getEnd() );
static private final Function<IdentifiedAnnotation,IdentifiedAnnotation> returnSelf = annotation -> annotation;
@Override
public void process( final JCas jcas ) throws AnalysisEngineProcessException {
LOGGER.info( "Starting processing" );
for ( Class<? extends IdentifiedAnnotation> eventClass : EVENT_CLASSES ) {
refineForClass( jcas, eventClass );
}
final Collection<AnatomicalSiteMention> anatomicals = JCasUtil.select( jcas, AnatomicalSiteMention.class );
final Collection<EntityMention> entityMentions = new ArrayList<>( JCasUtil.select( jcas, EntityMention.class ) );
entityMentions.removeAll( anatomicals );
refineForAnnotations( jcas, anatomicals );
refineForAnnotations( jcas, entityMentions );
LOGGER.info( "Finished processing" );
}
static private <T extends IdentifiedAnnotation> void refineForClass( final JCas jcas,
final Class<T> eventClass ) {
refineForAnnotations( jcas, JCasUtil.select( jcas, eventClass ) );
}
static private <T extends IdentifiedAnnotation> void refineForAnnotations( final JCas jcas,
final Collection<T> annotations ) {
final Map<TextSpan,IdentifiedAnnotation> annotationTextSpans
= annotations.stream().collect( Collectors.toMap( createTextSpan, returnSelf ) );
final Collection<TextSpan> unwantedSpans = getUnwantedSpans( annotationTextSpans.keySet() );
unwantedSpans.stream().map( annotationTextSpans::get ).forEach( t -> t.removeFromIndexes( jcas ) );
}
static private Collection<TextSpan> getUnwantedSpans( final Collection<TextSpan> originalTextSpans ) {
final List<TextSpan> textSpans = new ArrayList<>( originalTextSpans );
final Collection<TextSpan> discardSpans = new HashSet<>();
final int count = textSpans.size();
for ( int i = 0; i < count; i++ ) {
final TextSpan spanKeyI = textSpans.get( i );
for ( int j = i + 1; j < count; j++ ) {
final TextSpan spanKeyJ = textSpans.get( j );
if ( (spanKeyJ.getBegin() <= spanKeyI.getBegin() && spanKeyJ.getEnd() > spanKeyI.getEnd())
|| (spanKeyJ.getBegin() < spanKeyI.getBegin() && spanKeyJ.getEnd() >= spanKeyI.getEnd()) ) {
// J contains I, discard less precise concepts for span I and move on to next span I
discardSpans.add( spanKeyI );
break;
}
if ( ((spanKeyI.getBegin() <= spanKeyJ.getBegin() && spanKeyI.getEnd() > spanKeyJ.getEnd())
|| (spanKeyI.getBegin() < spanKeyJ.getBegin() && spanKeyI.getEnd() >= spanKeyJ.getEnd())) ) {
// I contains J, discard less precise concepts for span J and move on to next span J
discardSpans.add( spanKeyJ );
}
}
}
return discardSpans;
}
Good luck,
Sean
-----Original Message-----
From: Tomasz Oliwa [mailto:oliwa@uchicago.edu]
Sent: Thursday, November 19, 2015 12:08 PM
To: dev@ctakes.apache.org
Subject: TermConsumers
Hi,
How can I run a different TermConsumer on already generated CAS files?
I have CAS files created by the AggregatePlaintextFastUMLSProcessor with the DefaultTermConsumer set in cTakesHsql.xml.
Now I would like to apply the PrecisionTermConsumer on these CAS files without having to do the whole annotation process again. The IdentifiedAnnotations are all there, it is only a matter of removing them according to the TermConsumers logic.
Is there a way to create a passthrough Processor that simply reads the CAS, applies a different TermConsumer and writes it to disk?
Or is there a different way to go on about this?
Thanks for any help,
Tomasz