You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Radwen ANIBA <ar...@gmail.com> on 2009/06/23 17:33:28 UTC

setBegin and setEnd when annotating a text

Hi

Well I have to look for regular expression within a file But I need to
modify a little bit begin and end values since the document itself contain a
text like this :


lsdkqjqldjqslkdjqsldkjqsldqjsjjjjjqslkdjqslkdjqlsdkjqlsdkj
////
qsmdkqsmdlkqsmdkqjjjjqsùmdlqsùdlqsùdml
////
dsdjjjjjqsùmdlùqld

The text I'am looking for  is "jjjj" But the separator //// MUST reset
values of begin and end to zero each time we meet it in the document, I mean
for the first line we must have jjjj found in [30-34], in the line 3 it is
found in [18-21] etc, so the begin and end values are relative to the line
and not the entire document.

Anyone have a solution to modify begin and end values ?

Thx

Rad

Re: setBegin and setEnd when annotating a text

Posted by Marshall Schor <ms...@schor.com>.
Not sure what you're doing here...  but a couple of thoughts:

1) if the //// is meant to separate "documents" and you want to process
each one separately, you could have a cas reader component read the
original data and split it in separate docs and put each one in a CAS to
be processed independently.

2) if you just need to reset the begin and end for an annotation, these
are settable fields - so you can just set them to what you want.  But be
aware that some methods (e.g. getcoveredtext()) use these values to
locate the characters in the subject of analysis, and if you changed the
begin and end, it is likely other code would break.

-Marshall

Radwen ANIBA wrote:
> Hi
>
> Well I have to look for regular expression within a file But I need to
> modify a little bit begin and end values since the document itself contain a
> text like this :
>
>
> lsdkqjqldjqslkdjqsldkjqsldqjsjjjjjqslkdjqslkdjqlsdkjqlsdkj
> ////
> qsmdkqsmdlkqsmdkqjjjjqsùmdlqsùdlqsùdml
> ////
> dsdjjjjjqsùmdlùqld
>
> The text I'am looking for  is "jjjj" But the separator //// MUST reset
> values of begin and end to zero each time we meet it in the document, I mean
> for the first line we must have jjjj found in [30-34], in the line 3 it is
> found in [18-21] etc, so the begin and end values are relative to the line
> and not the entire document.
>
> Anyone have a solution to modify begin and end values ?
>
> Thx
>
> Rad
>
>