You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Jim Hargrave <Ha...@ldschurch.org> on 2010/09/09 01:00:50 UTC

Mutable text and annotations...

I apologize if my terminology doesn't match with normal UIMA usage - but hopefully the general idea will be understandable.

Is it always assumed that UIMA's document text is immutable? Let's say you have some text and with several position-based annotations. The text changes, now all of your annotation positions are incorrect. Are there API's that allow you to change your text, but still preserve the offsets in your annotations?

Jim


 NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



Re: Mutable text and annotations...

Posted by Nicolas Hernandez <ni...@gmail.com>.
Dear Jim

I think I understand what you mean. Playing with external analyzers
which add/remove tags to/from the text with regular expressions may
lead to this situation. The problem cannot exist with stand-off
annotations.

Assuming that only the number of whitespace characters have changed, you
may index the annotation on the **rank** of the "non-whitespace
character". The  rank of the "non-whitespace character" will not
change if none of them are added or removed. Thanks to the rank you
will be able to align the future annotations over the prior offsets.

This is the solution I imagined once.

Hope it help for you


On Thu, Sep 9, 2010 at 1:00 AM, Jim Hargrave <Ha...@ldschurch.org> wrote:
> I apologize if my terminology doesn't match with normal UIMA usage - but hopefully the general idea will be understandable.
>
> Is it always assumed that UIMA's document text is immutable? Let's say you have some text and with several position-based annotations. The text changes, now all of your annotation positions are incorrect. Are there API's that allow you to change your text, but still preserve the offsets in your annotations?
>
> Jim
>
>
>  NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>
>
>



-- 
Nicolas.Hernandez@univ-nantes.fr
--
http://www.univ-nantes.fr/hernandez-n
# Laboratoire LINA-TALN CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
# Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

Re: Mutable text and annotations...

Posted by Eddie Epstein <ea...@gmail.com>.
On Fri, Sep 10, 2010 at 2:22 AM, Jim <jh...@comcast.net> wrote:
> We are looking to build an editor for (human) translators that would display
> many layers of offset based annotations while allowing real time edits of
> both text and possibly the annotations themselves.

One type system model used for machine translation has the source text as
an immutable subject of analysis and the translation as annotations.
Since annotations can be deleted or modified at any time the translation itself
is dynamic. At the point that the translation is itself to be analyzed it is
assembled from the active annotations and used to create a new view with
the translation as subject of analysis.

Can you say more about "many layers" of annotations? Is there interest in
keeping a history of changes? The CAS type system together with CAS view
mechanism offer a lot of options.

Eddie

>
> So far this project (link below) is the best example we have seen. We were
> wondering if UIMA had something similar or could offer us some insights. We
> are certainly interested in applying UIMA annotators. But its the real time
> editing part we are finding challenging.
>
> http://code.google.com/p/wave-robot-java-client/
>
> Jim

Re: Mutable text and annotations...

Posted by Marshall Schor <ms...@schor.com>.
 Here's one suggestion.

I imagine that if someone edited the text of a paragraph, a simple edit could
end up changing quite a bit of the annotations.  So, a good approach would be to
re-do the annotations of the newly changed text, from the ground up. 

This means that you would create a new CAS with its subject-of-analysis being
the changed document, and run it through the pipeline again.

I will give a poor example (I'm not a great linguist...) If the original text was:

  The wet bank was close to the bridge.  It was full of people in bathing suits.

and there were annotations linking "It" and "bank" and bank was identified as
the side of a river.

And, then, you changed it to

    The central bank was close to the bridge.  It was full of people in bathing
suits.

and there were now annotations linking "it" and "bridge", and bank was
identified as a financial institution, you can see that smallish changes could
have long-distance and complex consequences.


I suppose the reason you don't want to do the straight forward approach of
creating a new CAS for every change has to do with thinking it would be too
inefficient.

There are a couple of ways that could be addressed.  The "document" (or whatever
you want to call the thing being worked on) could be split into smallish units
(for example, paragraphs), so the thing being re-processed would be smaller.  Of
course, this means that inter-paragraph effects would be lost.

Another thing you could do is to use the capability of the CAS to support
multiple views. Each view has its own subject of analysis.  (See
http://uima.apache.org/downloads/releaseDocs/2.3.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs
)

You could then try and write some kind of "fast path" that for an updated text,
would attempt to map as many of the previous annotations from the original text
to it.  I think this could be a difficult problem to solve in general, but in
specific cases, some fast path situations may exist.

HTH.  -Marshall Schor


On 9/10/2010 2:22 AM, Jim wrote:
> We are looking to build an editor for (human) translators that would display
> many layers of offset based annotations while allowing real time edits of both
> text and possibly the annotations themselves.
>
> So far this project (link below) is the best example we have seen. We were
> wondering if UIMA had something similar or could offer us some insights. We
> are certainly interested in applying UIMA annotators. But its the real time
> editing part we are finding challenging.
>
> http://code.google.com/p/wave-robot-java-client/
>
> Jim
>
>
> On 9/9/2010 1:30 AM, Thilo Götz wrote:
>> Hi,
>>
>> On 9/9/2010 01:00, Jim Hargrave wrote:
>>> I apologize if my terminology doesn't match with normal UIMA usage - but
>>> hopefully the general idea will be understandable.
>>>
>>> Is it always assumed that UIMA's document text is immutable?
>>
>> yes.
>>
>>> Let's say you have some text and with several position-based annotations.
>>> The text changes, now all of your annotation positions are incorrect. Are
>>> there API's that allow you to change your text, but still preserve the
>>> offsets in your annotations?
>>
>> There is no built-in support for this sort of thing in UIMA.
>> It would be easy to do after UIMA analysis has finished, but
>> I imagine you want to modify the text during analysis.  That
>> is not possible because UIMA subjects of analysis are
>> immutable.
>>
>> If you give us more details, we may have some ideas about
>> different approaches to the issue.
>>
>> --Thilo
>>
>>>
>>> Jim
>>>
>>>
>>>   NOTICE: This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential and privileged information. Any
>>> unauthorized review, use, disclosure or distribution is prohibited. If you
>>> are not the intended recipient, please contact the sender by reply email and
>>> destroy all copies of the original message.
>>>
>>>
>>>
>>
>
>
>
>

Re: Mutable text and annotations...

Posted by Jim <jh...@comcast.net>.
We are looking to build an editor for (human) translators that would 
display many layers of offset based annotations while allowing real time 
edits of both text and possibly the annotations themselves.

So far this project (link below) is the best example we have seen. We 
were wondering if UIMA had something similar or could offer us some 
insights. We are certainly interested in applying UIMA annotators. But 
its the real time editing part we are finding challenging.

http://code.google.com/p/wave-robot-java-client/

Jim


On 9/9/2010 1:30 AM, Thilo Götz wrote:
> Hi,
>
> On 9/9/2010 01:00, Jim Hargrave wrote:
>> I apologize if my terminology doesn't match with normal UIMA usage - but hopefully the general idea will be understandable.
>>
>> Is it always assumed that UIMA's document text is immutable?
>
> yes.
>
>> Let's say you have some text and with several position-based annotations. The text changes, now all of your annotation positions are incorrect. Are there API's that allow you to change your text, but still preserve the offsets in your annotations?
>
> There is no built-in support for this sort of thing in UIMA.
> It would be easy to do after UIMA analysis has finished, but
> I imagine you want to modify the text during analysis.  That
> is not possible because UIMA subjects of analysis are
> immutable.
>
> If you give us more details, we may have some ideas about
> different approaches to the issue.
>
> --Thilo
>
>>
>> Jim
>>
>>
>>   NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
>>
>>
>>
>



Re: Mutable text and annotations...

Posted by Thilo Götz <tw...@gmx.de>.
Hi,

On 9/9/2010 01:00, Jim Hargrave wrote:
> I apologize if my terminology doesn't match with normal UIMA usage - but hopefully the general idea will be understandable.
> 
> Is it always assumed that UIMA's document text is immutable? 

yes.

> Let's say you have some text and with several position-based annotations. The text changes, now all of your annotation positions are incorrect. Are there API's that allow you to change your text, but still preserve the offsets in your annotations?

There is no built-in support for this sort of thing in UIMA.
It would be easy to do after UIMA analysis has finished, but
I imagine you want to modify the text during analysis.  That
is not possible because UIMA subjects of analysis are
immutable.

If you give us more details, we may have some ideas about
different approaches to the issue.

--Thilo

> 
> Jim
> 
> 
>  NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> 
> 
>