You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Valkyrie Savage <sa...@tk.informatik.tu-darmstadt.de> on 2009/06/22 13:34:24 UTC

CAS Multipliers and Pipeline Troubles

Hello, all,

I'm working on a project involving UIMA, and I've run into some difficulties that I can't figure out.  This is my first month working with UIMA, so I am admittedly not well-versed in all its components and interactions, but I'll try to describe my problem as best I can.  I'm running UIMA 2.2.2-incubating with Java 1.6 inside of Eclipse Ganymede.

The project involves processing rather large documents, and the in-house components that I'm using have difficulty reading in a book-length chunk of text at a time.  For this reason, I've developed a very simple CAS multiplier; it takes in a CAS that contains Segment annotations and generates a new CAS for each Segment.  This multiplier is contained in an aggregate AE, and the other components of the AE are used for adding a few new annotations.  At the end of the aggregate is a simple CAS demultiplier; it is based heavily on the example in org.apache.uima.examples.casMultiplier, except that I hardcoded the tags that I want to copy across the demultiplying.

The problem that I am coming across is that the split CASes are being tagged correctly and merged correctly, but for whatever reason the merged CAS is not the one that is being sent on through the rest of the pipeline after this aggregate AE.  I have a simple CAS printer running at the end of the next() function of my demultiplier that shows that only the tags that I wanted are being retained after the merge, but they appear again if I add an AnnotationWriter in the next step of the pipeline.  I read about Flow Controllers, and it seems that the original CAS should be dropped from the pipeline by default, since new CASes are being created from it (I am not using any kind of user-defined Flow Controller), but that doesn't seem to be happening.  None of the new tags added in the Aggregate AE are being preserved, but all the tags that are supposed to be stripped out are being preserved.

If there's more information needed, I'll be happy to provide it.  As I mentioned, I'm new to UIMA, and I'm not sure how to go about trying to debug this.

Thank you!

Valkyrie Savage

RE: CAS Multipliers and Pipeline Troubles

Posted by Valkyrie Savage <sa...@tk.informatik.tu-darmstadt.de>.

Sorry about the re-post.  My mail got hung in the server for a couple days, I guess, when I was just joining the list.

Valkyrie

-----Original Message-----
From: Valkyrie Savage [mailto:savage@tk.informatik.tu-darmstadt.de]
Sent: Mon 6/22/2009 1:34 PM
To: uima-user@incubator.apache.org
Subject: CAS Multipliers and Pipeline Troubles

Hello, all,

I'm working on a project involving UIMA, and I've run into some difficulties that I can't figure out.  This is my first month working with UIMA, so I am admittedly not well-versed in all its components and interactions, but I'll try to describe my problem as best I can.  I'm running UIMA 2.2.2-incubating with Java 1.6 inside of Eclipse Ganymede.

The project involves processing rather large documents, and the in-house components that I'm using have difficulty reading in a book-length chunk of text at a time.  For this reason, I've developed a very simple CAS multiplier; it takes in a CAS that contains Segment annotations and generates a new CAS for each Segment.  This multiplier is contained in an aggregate AE, and the other components of the AE are used for adding a few new annotations.  At the end of the aggregate is a simple CAS demultiplier; it is based heavily on the example in org.apache.uima.examples.casMultiplier, except that I hardcoded the tags that I want to copy across the demultiplying.

The problem that I am coming across is that the split CASes are being tagged correctly and merged correctly, but for whatever reason the merged CAS is not the one that is being sent on through the rest of the pipeline after this aggregate AE.  I have a simple CAS printer running at the end of the next() function of my demultiplier that shows that only the tags that I wanted are being retained after the merge, but they appear again if I add an AnnotationWriter in the next step of the pipeline.  I read about Flow Controllers, and it seems that the original CAS should be dropped from the pipeline by default, since new CASes are being created from it (I am not using any kind of user-defined Flow Controller), but that doesn't seem to be happening.  None of the new tags added in the Aggregate AE are being preserved, but all the tags that are supposed to be stripped out are being preserved.

If there's more information needed, I'll be happy to provide it.  As I mentioned, I'm new to UIMA, and I'm not sure how to go about trying to debug this.

Thank you!

Valkyrie Savage