You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Swirl <lr...@gmail.com> on 2013/11/22 08:01:53 UTC

Running CasMultiplier inside a JCasIterable

I have successfully used CasMultiplier to spilt up a document into segments 
for further processing using SimplePipeline.runPipeline().
I did this by wrapping the CasMultiplier and the succeeding Annotator within a 
aggregate.

But by simply changing the usage of SimplePipeline.runPipeline() to using 
JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS 
only the number of physical documents, instead of the segments that i 
expected.

How can I can CasMultiplier to work with a JCasIterable?

Re: Running CasMultiplier inside a JCasIterable

Posted by Richard Eckart de Castilho <re...@apache.org>.

No, the issue is still open. 

When I start working on one of the issues that are still recorded on Google Code, I open a corresponding issue on the Apache Jira and add a link to each of them, pointing to each other. I also set the ASFJira flag on the Google Code tracker to true.

-- Richard

On 05.12.2013, at 02:07, Swirl <lr...@gmail.com> wrote:

> 
>> Option 2 - let UIMA do the heavy lifting
>> 
>> An alternative and much simple approach might be to create an aggregate which
>> does not only contain the engines, but also the reader. Then you don't have to 
>> worry about the reader anymore at all. Just create a UIMA JCasIterator and 
>> poll CASes from that until it is empty. Some additional info may be found in
>> the legacy issue 89 [1].
>> 
> 
> Hi Richard,
> Is the code in issue 89, implemented in uimafit 2.0.0?
> It does not work in uimafit 1.4.0 that I currently have.

Re: Running CasMultiplier inside a JCasIterable

Posted by Swirl <lr...@gmail.com>.

 
> Option 2 - let UIMA do the heavy lifting
> 
> An alternative and much simple approach might be to create an aggregate which
> does not only contain the engines, but also the reader. Then you don't have to 
> worry about the reader anymore at all. Just create a UIMA JCasIterator and 
> poll CASes from that until it is empty. Some additional info may be found in
> the legacy issue 89 [1].
> 

Hi Richard,
Is the code in issue 89, implemented in uimafit 2.0.0?
It does not work in uimafit 1.4.0 that I currently have.

Re: Running CasMultiplier inside a JCasIterable

Posted by Richard Eckart de Castilho <re...@apache.org>.

Option 1 - by foot:

I guess the uimaFIT JCasIterator should continue to read CAS by CAS
from the reader. However, for each CAS read by the reader, it should be
able to return 0-x CASes. Currently it can only return 1 because it 
calls engine.process(jCas) on each engine in turn. To return 0-x, 

I think, it would have create a single aggregate engine from all the engines, call engine.processAndOutputNewCASes(jCas) on that, and handle the UIMA JCasIterator
that is returned by it (sorry for two classes having the same name here…).

The UIMA JCasIterator would need to become part of the uimaFIT JCasIterator state.
Special handling needs to be introduced to make sure the hasNext() method still works,
in particular for the case that a CAS produced by the reader does not result in any
output CAS.

Option 2 - let UIMA do the heavy lifting

An alternative and much simple approach might be to create an aggregate which
does not only contain the engines, but also the reader. Then you don't have to 
worry about the reader anymore at all. Just create a UIMA JCasIterator and 
poll CASes from that until it is empty. Some additional info may be found in
the legacy issue 89 [1].

There are probably nasty details, but those should be roughly the general
approaches.

Cheers,

-- Richard

[1] https://code.google.com/p/uimafit/issues/detail?id=89

On 04.12.2013, at 01:16, Swirl <lr...@gmail.com> wrote:

> Richard Eckart de Castilho <re...@...> writes:
> 
>> 
>> For further reference:
>> 
>> https://issues.apache.org/jira/browse/UIMA-3470
> 
> Thanks for raising the Jira.
> 
> I tried looking at the source codes, but I think I am not able to come up with 
> a solution for this.
> Do you have any pointers to get me started?
> 
> Thanks.

Re: Running CasMultiplier inside a JCasIterable

Posted by Swirl <lr...@gmail.com>.

Richard Eckart de Castilho <re...@...> writes:

> 
> For further reference:
> 
> https://issues.apache.org/jira/browse/UIMA-3470
> 

Thanks for raising the Jira.

I tried looking at the source codes, but I think I am not able to come up with 
a solution for this.
Do you have any pointers to get me started?

Thanks.

Re: Running CasMultiplier inside a JCasIterable

Posted by Richard Eckart de Castilho <re...@apache.org>.

For further reference:

https://issues.apache.org/jira/browse/UIMA-3470

-- Richard

On 22.11.2013, at 07:37, Richard Eckart de Castilho <re...@apache.org> wrote:

> I believe the JCasIterable is currently implemented as a loop which calls
> "process" on the analysis engines for every CAS produced by the reader
> and then returns the corresponding CAS. This wouldn't work with multipliers.
> 
> Can you please file an issue in the Apache Jira, preferrably with a minimal
> test case attached. It shouldn't be a big problem to fix this for the next
> release. A patch already fixing this would also work, of course ;)
> 
> Cheers,
> 
> -- Richard
> 
> On 22.11.2013, at 08:01, Swirl <lr...@gmail.com> wrote:
> 
>> I have successfully used CasMultiplier to spilt up a document into segments 
>> for further processing using SimplePipeline.runPipeline().
>> I did this by wrapping the CasMultiplier and the succeeding Annotator within a 
>> aggregate.
>> 
>> But by simply changing the usage of SimplePipeline.runPipeline() to using 
>> JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS 
>> only the number of physical documents, instead of the segments that i 
>> expected.
>> 
>> How can I can CasMultiplier to work with a JCasIterable?

Re: Running CasMultiplier inside a JCasIterable

Posted by Richard Eckart de Castilho <re...@apache.org>.

I believe the JCasIterable is currently implemented as a loop which calls
"process" on the analysis engines for every CAS produced by the reader
and then returns the corresponding CAS. This wouldn't work with multipliers.

Can you please file an issue in the Apache Jira, preferrably with a minimal
test case attached. It shouldn't be a big problem to fix this for the next
release. A patch already fixing this would also work, of course ;)

Cheers,

-- Richard

On 22.11.2013, at 08:01, Swirl <lr...@gmail.com> wrote:

> I have successfully used CasMultiplier to spilt up a document into segments 
> for further processing using SimplePipeline.runPipeline().
> I did this by wrapping the CasMultiplier and the succeeding Annotator within a 
> aggregate.
> 
> But by simply changing the usage of SimplePipeline.runPipeline() to using 
> JCasIterable. The code no longer runs correctly, i.e., it's returning as CAS 
> only the number of physical documents, instead of the segments that i 
> expected.
> 
> How can I can CasMultiplier to work with a JCasIterable?