You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Katrin Tomanek <ka...@uni-jena.de> on 2008/07/28 17:59:09 UTC

annotation comparator

Hi,

I am looking for a tool to compare UIMA annotations, e.g., between 
different views. Such a tool might be used as a "CASDiff", i.e. the 
differences between annotations might be added as special 
diff-annotation to the CAS so that they can be visualized in a viewer.

I know that there is a tool by the University of Magdeburg. However, I 
didn't have the chance to take a closer look at this tool.

Did anybody else develop such a tool already ?

Thanks
Katrin

Re: annotation comparator

Posted by Igor Sominsky <so...@gmail.com>.

Peter,

CFE support configuration driven feature extraction. The extracted features 
can be used to do the comparison among other functions. As Eddie pointed in 
his email, the application decides what features are relevant to a 
particular comparison. Also the criteria for comparison of extracted 
features can be different for every application.
With CFE we performed comparison in 3 major steps
1. Identification of features to be compared in both sources and rules of 
their comparison
    On this step we write configuration files for feature extraction
2. Feature extraction and alignment of extracted features. On this step the 
identified features are extracted from both sources that are being compared 
into character separated files and they are aligned based on begin|end 
offsets of there containing annotation objects.
3. The results of the alignment is imported into a spreadsheet where 
performance metrics (precision/recall/f-score)
are calculated,

No doubts that the process should be futher automated.

Let me know if you have question.

----- Original Message ----- 
From: "Peter Klügl" <pk...@uni-wuerzburg.de>
To: <ui...@incubator.apache.org>
Sent: Friday, September 05, 2008 11:50 AM
Subject: Re: annotation comparator


> Hi,
>
> thanks for your answers.
>
> A default text-based diff of pretty printed annotations may not be a 
> solution for my specific requirements, but is a nice alternative for 
> manual testing (I am already using the pretty print methods for that). I 
> think i will keep my simple solution as a start, which is working similar 
> to your proposed one, but directly compares the features of the 
> annotations in java.
>
> I was wondering if the CFE project was supporting some sort of comparison 
> or testing since the paper has "testing" in its title, but I haven't found 
> any suitable fragments in the source code.
>
> On the long run, a good and reusable solution for the comparison and 
> automatic back-testing of annotation and/or FS can become a interesting 
> component. Maybe there is a possibility to combine some efforts? (pointing 
> amongst others to Katrin)
>
> have a nice weekend,
>
> Peter
>
>
>
> Eddie Epstein schrieb:
>> The problem with generic CAS comparison is the potential complexity of 
>> the
>> object model represented in a CAS. Instead of a single general purpose
>> method, another approach is application (or object model) specific
>> formatting code that would create output specifically designed for
>> comparison.
>>
>> If the object model to be compared is limited to annotations, just 
>> dumping
>> all annotations, each as a single line without covered text, in index 
>> order
>> would be useful as input to a standard diff program. Further sorting by
>> annotation type before the diff might help make the differences more
>> understandable in some situations.
>>
>> If you are interested, there are some annotation pretty print options in
>> UIMA that could help here.
>>
>> Eddie
>>
>> On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl 
>> <pk...@uni-wuerzburg.de>wrote:
>>
>>
>>> Hi,
>>>
>>> what is the status quo for the comparison of two CAS right now? Is there
>>> yet any usable solution (with or without documentation)?
>>>
>>> I am developing a rule-based system (with scripting functionalities)
>>> especially for complex information and text extraction tasks. The IDE is
>>> DLTK-based and UIMA descriptors (for a generic implementation) are 
>>> generated
>>> automatically. Currently i am improving a information extraction 
>>> application
>>> with a test-driven approach. The test cases are, of course, CAS XMI 
>>> files
>>> and the comparison (of two CAS) is working, but yet unsatisfying. I am
>>> especially interested in annotations for the false positives and false
>>> negatives (overlapping or not overlapping).
>>>
>>> Back to my question:
>>> How do you all compare two CAS?
>>> Is there a reusable implementation?
>>>
>>>
>>> Peter
>>>
>>>
>>> Katrin Tomanek schrieb:
>>>
>>>  Hi,
>>>
>>>>  Depends what your favorite tooling story is.  If you prefer
>>>>
>>>>> the eclipse tooling, it should go into eclipse.  I know
>>>>> people who would use this kind of functionality if it was
>>>>> in CVD :-)
>>>>>
>>>>>
>>>>>
>>>>>> And shouldn't the differences be kept as new annotation types so the
>>>>>> viewers don't need to be changed?
>>>>>>
>>>>>>
>>>>> Somehow I don't see that.  The tooling could be made a lot
>>>>> nicer if it knows it's displaying differences.  And I wouldn't
>>>>> want to add annotations to my data just for display purposes.
>>>>> Or maybe I misunderstood?
>>>>>
>>>>>
>>>> Mh, not sure. This is probably data that is only used in evaluation
>>>> scenarios, so I don't see a big problem with it.
>>>> Well, in our first version we now just add new types.  Works OK for us 
>>>> so
>>>> far. However, its really just a first version...
>>>>
>>>> Katrin
>>>>
>>>>
>>> --
>>> Peter Klügl
>>> University of Würzburg
>>> pkluegl@uni-wuerzburg.de
>>>
>>>
>>>
>>
>>
>
>
> -- 
> Peter Klügl
> University of Würzburg
> pkluegl@uni-wuerzburg.de
>

Re: annotation comparator

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.

Hi,

thanks for your answers.

A default text-based diff of pretty printed annotations may not be a 
solution for my specific requirements, but is a nice alternative for 
manual testing (I am already using the pretty print methods for that). I 
think i will keep my simple solution as a start, which is working 
similar to your proposed one, but directly compares the features of the 
annotations in java.

I was wondering if the CFE project was supporting some sort of 
comparison or testing since the paper has "testing" in its title, but I 
haven't found any suitable fragments in the source code.

On the long run, a good and reusable solution for the comparison and 
automatic back-testing of annotation and/or FS can become a interesting 
component. Maybe there is a possibility to combine some efforts? 
(pointing amongst others to Katrin)

have a nice weekend,

Peter



Eddie Epstein schrieb:
> The problem with generic CAS comparison is the potential complexity of the
> object model represented in a CAS. Instead of a single general purpose
> method, another approach is application (or object model) specific
> formatting code that would create output specifically designed for
> comparison.
>
> If the object model to be compared is limited to annotations, just dumping
> all annotations, each as a single line without covered text, in index order
> would be useful as input to a standard diff program. Further sorting by
> annotation type before the diff might help make the differences more
> understandable in some situations.
>
> If you are interested, there are some annotation pretty print options in
> UIMA that could help here.
>
> Eddie
>
> On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl <pk...@uni-wuerzburg.de>wrote:
>
>   
>> Hi,
>>
>> what is the status quo for the comparison of two CAS right now? Is there
>> yet any usable solution (with or without documentation)?
>>
>> I am developing a rule-based system (with scripting functionalities)
>> especially for complex information and text extraction tasks. The IDE is
>> DLTK-based and UIMA descriptors (for a generic implementation) are generated
>> automatically. Currently i am improving a information extraction application
>> with a test-driven approach. The test cases are, of course, CAS XMI files
>> and the comparison (of two CAS) is working, but yet unsatisfying. I am
>> especially interested in annotations for the false positives and false
>> negatives (overlapping or not overlapping).
>>
>> Back to my question:
>> How do you all compare two CAS?
>> Is there a reusable implementation?
>>
>>
>> Peter
>>
>>
>> Katrin Tomanek schrieb:
>>
>>  Hi,
>>     
>>>  Depends what your favorite tooling story is.  If you prefer
>>>       
>>>> the eclipse tooling, it should go into eclipse.  I know
>>>> people who would use this kind of functionality if it was
>>>> in CVD :-)
>>>>
>>>>
>>>>         
>>>>> And shouldn't the differences be kept as new annotation types so the
>>>>> viewers don't need to be changed?
>>>>>
>>>>>           
>>>> Somehow I don't see that.  The tooling could be made a lot
>>>> nicer if it knows it's displaying differences.  And I wouldn't
>>>> want to add annotations to my data just for display purposes.
>>>> Or maybe I misunderstood?
>>>>
>>>>         
>>> Mh, not sure. This is probably data that is only used in evaluation
>>> scenarios, so I don't see a big problem with it.
>>> Well, in our first version we now just add new types.  Works OK for us so
>>> far. However, its really just a first version...
>>>
>>> Katrin
>>>
>>>       
>> --
>> Peter Klügl
>> University of Würzburg
>> pkluegl@uni-wuerzburg.de
>>
>>
>>     
>
>   


-- 
Peter Klügl
University of Würzburg
pkluegl@uni-wuerzburg.de

Re: annotation comparator

Posted by Eddie Epstein <ea...@gmail.com>.

The problem with generic CAS comparison is the potential complexity of the
object model represented in a CAS. Instead of a single general purpose
method, another approach is application (or object model) specific
formatting code that would create output specifically designed for
comparison.

If the object model to be compared is limited to annotations, just dumping
all annotations, each as a single line without covered text, in index order
would be useful as input to a standard diff program. Further sorting by
annotation type before the diff might help make the differences more
understandable in some situations.

If you are interested, there are some annotation pretty print options in
UIMA that could help here.

Eddie

On Thu, Sep 4, 2008 at 7:25 AM, Peter Klügl <pk...@uni-wuerzburg.de>wrote:

> Hi,
>
> what is the status quo for the comparison of two CAS right now? Is there
> yet any usable solution (with or without documentation)?
>
> I am developing a rule-based system (with scripting functionalities)
> especially for complex information and text extraction tasks. The IDE is
> DLTK-based and UIMA descriptors (for a generic implementation) are generated
> automatically. Currently i am improving a information extraction application
> with a test-driven approach. The test cases are, of course, CAS XMI files
> and the comparison (of two CAS) is working, but yet unsatisfying. I am
> especially interested in annotations for the false positives and false
> negatives (overlapping or not overlapping).
>
> Back to my question:
> How do you all compare two CAS?
> Is there a reusable implementation?
>
>
> Peter
>
>
> Katrin Tomanek schrieb:
>
>  Hi,
>>
>>
>>  Depends what your favorite tooling story is.  If you prefer
>>> the eclipse tooling, it should go into eclipse.  I know
>>> people who would use this kind of functionality if it was
>>> in CVD :-)
>>>
>>>
>>>> And shouldn't the differences be kept as new annotation types so the
>>>> viewers don't need to be changed?
>>>>
>>>
>>> Somehow I don't see that.  The tooling could be made a lot
>>> nicer if it knows it's displaying differences.  And I wouldn't
>>> want to add annotations to my data just for display purposes.
>>> Or maybe I misunderstood?
>>>
>>
>> Mh, not sure. This is probably data that is only used in evaluation
>> scenarios, so I don't see a big problem with it.
>> Well, in our first version we now just add new types.  Works OK for us so
>> far. However, its really just a first version...
>>
>> Katrin
>>
>
>
> --
> Peter Klügl
> University of Würzburg
> pkluegl@uni-wuerzburg.de
>
>

Re: annotation comparator

Posted by Thilo Goetz <tw...@gmx.de>.

Peter Klügl wrote:
> Hi,
> 
> what is the status quo for the comparison of two CAS right now? Is there 
> yet any usable solution (with or without documentation)?
> 
> I am developing a rule-based system (with scripting functionalities) 
> especially for complex information and text extraction tasks. The IDE is 
> DLTK-based and UIMA descriptors (for a generic implementation) are 
> generated automatically. Currently i am improving a information 
> extraction application with a test-driven approach. The test cases are, 
> of course, CAS XMI files and the comparison (of two CAS) is working, but 
> yet unsatisfying. I am especially interested in annotations for the 
> false positives and false negatives (overlapping or not overlapping).
> 
> Back to my question:
> How do you all compare two CAS?
> Is there a reusable implementation?

I don't know of one.

> 
> 
> Peter
> 
> 
> Katrin Tomanek schrieb:
>> Hi,
>>
>>
>>> Depends what your favorite tooling story is.  If you prefer
>>> the eclipse tooling, it should go into eclipse.  I know
>>> people who would use this kind of functionality if it was
>>> in CVD :-)
>>>
>>>>
>>>> And shouldn't the differences be kept as new annotation types so the 
>>>> viewers don't need to be changed?
>>>
>>> Somehow I don't see that.  The tooling could be made a lot
>>> nicer if it knows it's displaying differences.  And I wouldn't
>>> want to add annotations to my data just for display purposes.
>>> Or maybe I misunderstood?
>>
>> Mh, not sure. This is probably data that is only used in evaluation 
>> scenarios, so I don't see a big problem with it.
>> Well, in our first version we now just add new types.  Works OK for us 
>> so far. However, its really just a first version...
>>
>> Katrin
> 
>

Re: annotation comparator

Posted by Peter Klügl <pk...@uni-wuerzburg.de>.

Hi,

what is the status quo for the comparison of two CAS right now? Is there 
yet any usable solution (with or without documentation)?

I am developing a rule-based system (with scripting functionalities) 
especially for complex information and text extraction tasks. The IDE is 
DLTK-based and UIMA descriptors (for a generic implementation) are 
generated automatically. Currently i am improving a information 
extraction application with a test-driven approach. The test cases are, 
of course, CAS XMI files and the comparison (of two CAS) is working, but 
yet unsatisfying. I am especially interested in annotations for the 
false positives and false negatives (overlapping or not overlapping).

Back to my question:
How do you all compare two CAS?
Is there a reusable implementation?


Peter


Katrin Tomanek schrieb:
> Hi,
>
>
>> Depends what your favorite tooling story is.  If you prefer
>> the eclipse tooling, it should go into eclipse.  I know
>> people who would use this kind of functionality if it was
>> in CVD :-)
>>
>>>
>>> And shouldn't the differences be kept as new annotation types so the 
>>> viewers don't need to be changed?
>>
>> Somehow I don't see that.  The tooling could be made a lot
>> nicer if it knows it's displaying differences.  And I wouldn't
>> want to add annotations to my data just for display purposes.
>> Or maybe I misunderstood?
>
> Mh, not sure. This is probably data that is only used in evaluation 
> scenarios, so I don't see a big problem with it.
> Well, in our first version we now just add new types.  Works OK for us 
> so far. However, its really just a first version...
>
> Katrin


-- 
Peter Klügl
University of Würzburg
pkluegl@uni-wuerzburg.de

Re: annotation comparator

Posted by Katrin Tomanek <ka...@uni-jena.de>.

Hi,


> Depends what your favorite tooling story is.  If you prefer
> the eclipse tooling, it should go into eclipse.  I know
> people who would use this kind of functionality if it was
> in CVD :-)
> 
>>
>> And shouldn't the differences be kept as new annotation types so the 
>> viewers don't need to be changed?
> 
> Somehow I don't see that.  The tooling could be made a lot
> nicer if it knows it's displaying differences.  And I wouldn't
> want to add annotations to my data just for display purposes.
> Or maybe I misunderstood?

Mh, not sure. This is probably data that is only used in evaluation 
scenarios, so I don't see a big problem with it.
Well, in our first version we now just add new types.  Works OK for us 
so far. However, its really just a first version...

Katrin

Re: annotation comparator

Posted by Thilo Goetz <tw...@gmx.de>.

Katrin Tomanek wrote:
> Hi,
> 
>> Hm, not exactly this, but something similar.  We have various forms
>> of a technology that will compare annotations in various CASes.  This
>> is useful if, for example, you run some analysis and save the results.
>> Then you modify you analysis and want to see what difference this made
>> when run on your sample text.  Not sure this is what you want.
> 
> Well, this is certainly a similar scenario. Although I would probably 
> compare against a gold standard, from a technical perspective the same 
> tools can do the job, of course.

True.  In your case you probably want some statistics as well, right?

> 
>> The basic comparison is fairly simple (if the type system is constant),
>> displaying the differences in a meaningful way is the real challenge.
>> We're accepting contributions to the CVD :-)
> 
> Do you think the CVD is the right place for this?

Depends what your favorite tooling story is.  If you prefer
the eclipse tooling, it should go into eclipse.  I know
people who would use this kind of functionality if it was
in CVD :-)

> 
> And shouldn't the differences be kept as new annotation types so the 
> viewers don't need to be changed?

Somehow I don't see that.  The tooling could be made a lot
nicer if it knows it's displaying differences.  And I wouldn't
want to add annotations to my data just for display purposes.
Or maybe I misunderstood?

> 
> Katrin

Re: annotation comparator

Posted by Katrin Tomanek <ka...@uni-jena.de>.

Hi,

> Hm, not exactly this, but something similar.  We have various forms
> of a technology that will compare annotations in various CASes.  This
> is useful if, for example, you run some analysis and save the results.
> Then you modify you analysis and want to see what difference this made
> when run on your sample text.  Not sure this is what you want.

Well, this is certainly a similar scenario. Although I would probably 
compare against a gold standard, from a technical perspective the same 
tools can do the job, of course.

> The basic comparison is fairly simple (if the type system is constant),
> displaying the differences in a meaningful way is the real challenge.
> We're accepting contributions to the CVD :-)

Do you think the CVD is the right place for this?

And shouldn't the differences be kept as new annotation types so the 
viewers don't need to be changed?

Katrin

Re: annotation comparator

Posted by Tong Fin <to...@gmail.com>.

On Mon, Jul 28, 2008 at 2:07 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Come to think of it, doesn't the CAS Viewer do something like this?
> Tong?
>
> --Thilo
>
>
It is possible to extend the CAS Viewer to support that if the comparison is
simple. But, we need some way to justify the effort to do that.

-- Tong

Re: annotation comparator

Posted by Thilo Goetz <tw...@gmx.de>.

Katrin Tomanek wrote:
> Hi,
> 
> I am looking for a tool to compare UIMA annotations, e.g., between 
> different views. Such a tool might be used as a "CASDiff", i.e. the 
> differences between annotations might be added as special 
> diff-annotation to the CAS so that they can be visualized in a viewer.
> 
> I know that there is a tool by the University of Magdeburg. However, I 
> didn't have the chance to take a closer look at this tool.
> 
> Did anybody else develop such a tool already ?
> 
> Thanks
> Katrin

Hm, not exactly this, but something similar.  We have various forms
of a technology that will compare annotations in various CASes.  This
is useful if, for example, you run some analysis and save the results.
Then you modify you analysis and want to see what difference this made
when run on your sample text.  Not sure this is what you want.

The basic comparison is fairly simple (if the type system is constant),
displaying the differences in a meaningful way is the real challenge.
We're accepting contributions to the CVD :-)

Come to think of it, doesn't the CAS Viewer do something like this?
Tong?

--Thilo

Re: annotation comparator

Posted by Katrin Tomanek <ka...@uni-jena.de>.

Yoshinobu KANO wrote:
> We have developed a comparison tool which seems related to what you are
> looking for.
> Our tool compares annotations inside a CAS, and visualizes results.
> However, generally speaking, it is always the problem which annotations
> could/should be compared.
> Could you explain your purpose in a bit more detailed?

Well, in a simple scenario, say NER, we just want to compare the entity 
mentions found by different components (either different analysis 
components, or e.g. a gold standard). Features of the annotation types 
should also be considered during comparison. I thought that such a 
comparator would make new annotations to the CAS for TP/FP/... so that 
from such annotations overall scores (R/P/F) can be calculated.

Katrin

Re: annotation comparator

Posted by Yoshinobu KANO <ka...@is.s.u-tokyo.ac.jp>.

Hi Marshall,


We have developed a comparison tool which seems related to what you are
>> looking for.
>> Our tool compares annotations inside a CAS, and visualizes results.
>> However, generally speaking, it is always the problem which annotations
>> could/should be compared.
>> Could you explain your purpose in a bit more detailed?
>>
>
> Hi KANO-san,
>
> I would like to hear more about your tool.  Is there any description /
> documentation, etc. available for it?


Thank you for your interest in our project.
Documentations are currently in preparation.
We are aiming at the beginning of August to make our public release,
hopefully,
then we will post an announcement to this ml.
Would you please wait it for a while?

Thanks,

-Yoshinobu
-- 
Yoshinobu KANO
kano@is.s.u-tokyo.ac.jp
Research Associate, the University of Tokyo
http://www-tsujii.is.s.u-tokyo.ac.jp/

Re: annotation comparator

Posted by Marshall Schor <ms...@schor.com>.


Yoshinobu KANO wrote:
> We have developed a comparison tool which seems related to what you are
> looking for.
> Our tool compares annotations inside a CAS, and visualizes results.
> However, generally speaking, it is always the problem which annotations
> could/should be compared.
> Could you explain your purpose in a bit more detailed?

Hi KANO-san,

I would like to hear more about your tool.  Is there any description / 
documentation, etc. available for it?

-Marshall

> 
> -Yoshinobu
> 
> 2008/7/29 Katrin Tomanek <ka...@uni-jena.de>
> 
>> Hi,
>>
>> I am looking for a tool to compare UIMA annotations, e.g., between
>> different views. Such a tool might be used as a "CASDiff", i.e. the
>> differences between annotations might be added as special diff-annotation to
>> the CAS so that they can be visualized in a viewer.
>>
>> I know that there is a tool by the University of Magdeburg. However, I
>> didn't have the chance to take a closer look at this tool.
>>
>> Did anybody else develop such a tool already ?
>>
>> Thanks
>> Katrin
>>
>>
> 
>

Re: annotation comparator

Posted by Yoshinobu KANO <ka...@is.s.u-tokyo.ac.jp>.

We have developed a comparison tool which seems related to what you are
looking for.
Our tool compares annotations inside a CAS, and visualizes results.
However, generally speaking, it is always the problem which annotations
could/should be compared.
Could you explain your purpose in a bit more detailed?

-Yoshinobu

2008/7/29 Katrin Tomanek <ka...@uni-jena.de>

> Hi,
>
> I am looking for a tool to compare UIMA annotations, e.g., between
> different views. Such a tool might be used as a "CASDiff", i.e. the
> differences between annotations might be added as special diff-annotation to
> the CAS so that they can be visualized in a viewer.
>
> I know that there is a tool by the University of Magdeburg. However, I
> didn't have the chance to take a closer look at this tool.
>
> Did anybody else develop such a tool already ?
>
> Thanks
> Katrin
>
>


-- 
Yoshinobu KANO
kano@is.s.u-tokyo.ac.jp
Research Associate, the University of Tokyo
http://www-tsujii.is.s.u-tokyo.ac.jp/