You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by GATE User <ga...@ymail.com> on 2013/05/20 04:21:09 UTC

Changing the original text based on annotations

1)  How do I change the original message based on annotations in UIMA.  For example, lets say I have the string:
201301012345

That contains both the date and time.  I want to have an annotator that will find such things in the text and add a space between them so it becomes:
2030101 2345

What's the easiest way to modify the text in this instance?

Also, let's say I have the sentence:

See Spot run far down Main Street.

and I have an annotator that that finds and labels main street as a street name.  Now I want to make an annotator that, if it finds a street name annotation, to change that street name into something else, like River Blvd.  So the above sentence would be:

See Spot run far down River Blvd.

What's the easiest way to do this?  Will I, afterwards, have to resend the CAS through the pipeline again or is there an easy way to update all annotations that would be affected by the change since River Blvd is shorter than Main Street?

Thanks in advance.

Re: Changing the original text based on annotations

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
Hi,

UIMA Ruta (originally released as UIMA TextMarker) provides 
functionality for modifications in the document (in a new view) and for 
transforming the offsets of the annotations when the document is 
changed. However, both are not yet linked.

There is a short introduction in modification in the documentation (last 
paragraph):
http://uima.apache.org/d/textmarker-current/tools.textmarker.book.html#ugr.tools.tm.overview.examples

Just let me know, if you need more detailed information.

Best,

Peter


Am 20.05.2013 04:21, schrieb GATE User:
> 1)  How do I change the original message based on annotations in UIMA.  For example, lets say I have the string:
> 201301012345
>
> That contains both the date and time.  I want to have an annotator that will find such things in the text and add a space between them so it becomes:
> 2030101 2345
>
> What's the easiest way to modify the text in this instance?
>
> Also, let's say I have the sentence:
>
> See Spot run far down Main Street.
>
> and I have an annotator that that finds and labels main street as a street name.  Now I want to make an annotator that, if it finds a street name annotation, to change that street name into something else, like River Blvd.  So the above sentence would be:
>
> See Spot run far down River Blvd.
>
> What's the easiest way to do this?  Will I, afterwards, have to resend the CAS through the pipeline again or is there an easy way to update all annotations that would be affected by the change since River Blvd is shorter than Main Street?
>
> Thanks in advance.


Re: Changing the original text based on annotations

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.
On 20.05.2013 23:01, GATE User wrote:
> Thanks Richard and Peter:
>
> What I want to be able to do is, when the xml is returned, a program should then be able to find the "corrected" message and use that for future operations.  Will using views allow this?  Is it simply easier to just make a new CAS?  Thanks again.
>

Can you please provide more information?

I assume that xml refers to the document text you are processing. I
think it will not make a big difference if you use an additional view or
a completely new CAS, if the processing won't happen in one pipeline.

You could add a new annotation, which indicates that the covered text
has changed, in order to make the analysis engine sensible to modifications.

If I have understood you correctly, then the approach Richard described
(or the Ruta implementation) should solve your problem.

Best,

Peter


>
>
> ________________________________
>  From: Richard Eckart de Castilho <ri...@gmail.com>
> To: user@uima.apache.org; GATE User <ga...@ymail.com> 
> Sent: Monday, May 20, 2013 4:42 AM
> Subject: Re: Changing the original text based on annotations
>  
>
> Hi,
>
> UIMA doesn't allow text to be changed, but you can create a new view with new text. 
>
> When I needed that, I implemented a set of annotations to mark text as "to be inserted/deleted/changed",
> e.g. based on the results of a spell checker. Then I run an annotator which interpreted all
> these annotations an created a new view with the updated text. Subsequent annotators would
> work on the new view then.
>
> What I have done on this in the past is published as
>
> Eckart De Castilho, Richard, and Iryna Gurevych. 
> "DKPro-UGD: A Flexible Data-Cleansing Approach to Processing User-Generated Discourse." [1]
>
> The latest version of the components described there is available in DKPro Core [2].
>
> Cheers,
>
> -- Richard
>
> [1] http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2009-0078
> [2] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.castransformation-asl
>
> Am 20.05.2013 um 04:21 schrieb GATE User <ga...@ymail.com>:
>
>> 1)  How do I change the original message based on annotations in UIMA.  For example, lets say I have the string:
>> 201301012345
>>
>> That contains both the date and time.  I want to have an annotator that will find such things in the text and add a space between them so it becomes:
>> 2030101 2345
>>
>> What's the easiest way to modify the text in this instance?
>>
>> Also, let's say I have the sentence:
>>
>> See Spot run far down Main Street.
>>
>> and I have an annotator that that finds and labels main street as a street name.  Now I want to make an annotator that, if it finds a street name annotation, to change that street name into something else, like River Blvd.  So the above sentence would be:
>>
>> See Spot run far down River Blvd.
>>
>> What's the easiest way to do this?  Will I, afterwards, have to resend the CAS through the pipeline again or is there an easy way to update all annotations that would be affected by the change since River Blvd is shorter than Main Street?
>>
>> Thanks in advance.


Re: Changing the original text based on annotations

Posted by GATE User <ga...@ymail.com>.
Thanks Richard and Peter:

What I want to be able to do is, when the xml is returned, a program should then be able to find the "corrected" message and use that for future operations.  Will using views allow this?  Is it simply easier to just make a new CAS?  Thanks again.




________________________________
 From: Richard Eckart de Castilho <ri...@gmail.com>
To: user@uima.apache.org; GATE User <ga...@ymail.com> 
Sent: Monday, May 20, 2013 4:42 AM
Subject: Re: Changing the original text based on annotations
 

Hi,

UIMA doesn't allow text to be changed, but you can create a new view with new text. 

When I needed that, I implemented a set of annotations to mark text as "to be inserted/deleted/changed",
e.g. based on the results of a spell checker. Then I run an annotator which interpreted all
these annotations an created a new view with the updated text. Subsequent annotators would
work on the new view then.

What I have done on this in the past is published as

Eckart De Castilho, Richard, and Iryna Gurevych. 
"DKPro-UGD: A Flexible Data-Cleansing Approach to Processing User-Generated Discourse." [1]

The latest version of the components described there is available in DKPro Core [2].

Cheers,

-- Richard

[1] http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2009-0078
[2] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.castransformation-asl

Am 20.05.2013 um 04:21 schrieb GATE User <ga...@ymail.com>:

> 1)  How do I change the original message based on annotations in UIMA.  For example, lets say I have the string:
> 201301012345
> 
> That contains both the date and time.  I want to have an annotator that will find such things in the text and add a space between them so it becomes:
> 2030101 2345
> 
> What's the easiest way to modify the text in this instance?
> 
> Also, let's say I have the sentence:
> 
> See Spot run far down Main Street.
> 
> and I have an annotator that that finds and labels main street as a street name.  Now I want to make an annotator that, if it finds a street name annotation, to change that street name into something else, like River Blvd.  So the above sentence would be:
> 
> See Spot run far down River Blvd.
> 
> What's the easiest way to do this?  Will I, afterwards, have to resend the CAS through the pipeline again or is there an easy way to update all annotations that would be affected by the change since River Blvd is shorter than Main Street?
> 
> Thanks in advance.

Re: Changing the original text based on annotations

Posted by Richard Eckart de Castilho <ri...@gmail.com>.
Hi,

UIMA doesn't allow text to be changed, but you can create a new view with new text. 

When I needed that, I implemented a set of annotations to mark text as "to be inserted/deleted/changed",
e.g. based on the results of a spell checker. Then I run an annotator which interpreted all
these annotations an created a new view with the updated text. Subsequent annotators would
work on the new view then.

What I have done on this in the past is published as

Eckart De Castilho, Richard, and Iryna Gurevych. 
"DKPro-UGD: A Flexible Data-Cleansing Approach to Processing User-Generated Discourse." [1]

The latest version of the components described there is available in DKPro Core [2].

Cheers,

-- Richard

[1] http://www.ukp.tu-darmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1%5Bpub_id%5D=TUD-CS-2009-0078
[2] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.castransformation-asl

Am 20.05.2013 um 04:21 schrieb GATE User <ga...@ymail.com>:

> 1)  How do I change the original message based on annotations in UIMA.  For example, lets say I have the string:
> 201301012345
> 
> That contains both the date and time.  I want to have an annotator that will find such things in the text and add a space between them so it becomes:
> 2030101 2345
> 
> What's the easiest way to modify the text in this instance?
> 
> Also, let's say I have the sentence:
> 
> See Spot run far down Main Street.
> 
> and I have an annotator that that finds and labels main street as a street name.  Now I want to make an annotator that, if it finds a street name annotation, to change that street name into something else, like River Blvd.  So the above sentence would be:
> 
> See Spot run far down River Blvd.
> 
> What's the easiest way to do this?  Will I, afterwards, have to resend the CAS through the pipeline again or is there an easy way to update all annotations that would be affected by the change since River Blvd is shorter than Main Street?
> 
> Thanks in advance.