You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Marti Quixal <ma...@psycho.uni-tuebingen.de> on 2020/12/01 13:32:28 UTC

RUTA - RUTA gold standard evaluation functionalitiy

Dear Listers,

I am not sure this is the right forum to ask this question, but I will try it out.

We have been using RUTA (in eclipse) for a while now and were thinking of building a gold standard using a semi-automatized approach. We develop rules, we tag corpus, we assume most of it will be right, we develop again and hope what was correctly done keeps working and new rules are refined until they do their job.

While doing this we have come up with a couple of issues, one of which I would like to solve.

We have moved our files in our output folder to our test folder (the folder with the script name) and run the evaluation of the Annotation View using the green triangle.

When we look at the TP, FP and FN counts we do not get a perfect much (100% FP, 0% FP and FN). That is surprising I guess. When I look into the file in the test folder particularly the FP window, I can see where supposedly the discrepancy is found, but not what the difference is between what was expected and what actually was tagged (I see just the word “error”). Is there a way to get a more specific information here?

I attach two screen captures in case this helps a bit.

Thanks a lot for your time in advance.

Best regards,
Martí

PS I have no experience in Java and about the basics in object-oriented programming. I am doing this as an end-user of RUTA in that sense.

--
Martí Quixal, PhD
AB Schulpsychologie / Seminar für Sprachwissenschaft
Eberhard Karls Universität Tübingen

Web:
- http://www.sfs.uni-tuebingen.de/~quixal/
- https://uni-tuebingen.de/en/158882

DigBinDiff:
- https://uni-tuebingen.de/en/169776

Re: RUTA - RUTA gold standard evaluation functionalitiy

Posted by Quixal, Martí <ma...@psycho.uni-tuebingen.de>.

Dear Peter,
> the problems is fixed now. Thank for reporting.
>

> In order to use a fixed version, you need either to wait for the next 
> release (hopefully sometimes in Dec.) and update your Ruta Workbench, or 
> update to a snapshot version of the plugins.
> 
> Does one of these options work for you?
> 
>
thanks a lot! Yes that will be fine.


> Besides the inital problem, the comparision of the annotation in these 
> views is overall limited. A workaround could be to inspect and compare 
> the features of the annotations in the CAS manually.
>
Yes, this is something we did do, and it works.

Best,
Martí

> 
> Best,
> 
> 
> Peter
> 
> 
> Am 01.12.2020 um 14:32 schrieb Marti Quixal:
>> Dear Listers,
>>
>> I am not sure this is the right forum to ask this question, but I will 
>> try it out.
>>
>> We have been using RUTA (in eclipse) for a while now and were thinking 
>> of building a gold standard using a semi-automatized approach. We 
>> develop rules, we tag corpus, we assume most of it will be right, we 
>> develop again and hope what was correctly done keeps working and new 
>> rules are refined until they do their job.
>>
>> While doing this we have come up with a couple of issues, one of which 
>> I would like to solve.
>>
>> We have moved our files in our output folder to our test folder (the 
>> folder with the script name) and run the evaluation of the Annotation 
>> View using the green triangle.
>>
>> When we look at the TP, FP and FN counts we do not get a perfect much 
>> (100% FP, 0% FP and FN). That is surprising I guess. When I look into 
>> the file in the test folder particularly the FP window, I can see 
>> where supposedly the discrepancy is found, but not what the difference 
>> is between what was expected and what actually was tagged (I see just 
>> the word “error”). Is there a way to get a more specific information here?
>>
>> I attach two screen captures in case this helps a bit.
>>
>> Thanks a lot for your time in advance.
>>
>> Best regards,
>> Martí
>>
>> PS I have no experience in Java and about the basics in 
>> object-oriented programming. I am doing this as an end-user of RUTA in 
>> that sense.
>>
>>
>>
>> --
>> Martí Quixal, PhD
>> AB Schulpsychologie / Seminar für Sprachwissenschaft
>> Eberhard Karls Universität Tübingen
>>
>> Web:
>> - http://www.sfs.uni-tuebingen.de/~quixal/ 
>> <http://www.sfs.uni-tuebingen.de/~quixal/>
>> - https://uni-tuebingen.de/en/158882 <https://uni-tuebingen.de/en/158882>
>>
>> DigBinDiff:
>> - https://uni-tuebingen.de/en/169776 <https://uni-tuebingen.de/en/169776>
>>
> -- 
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
> 
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
> 
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email:peter.kluegl@averbis.com
> Web:https://averbis.com
> 
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> 

-- 
*******************************************************
Dr. Martí Quixal
AB Schulpsychologie / Seminar für Sprachwissenschaft
Eberhard Karls Universität Tübingen
Sleichstr. 4
72076 Tübingen (Germany)

Telefon: +49 7071 29-75422 / Telefax: +49 7071 29-5902
E-Mail: marti.quixal@psycho.uni-tuebingen.de
*******************************************************

Re: RUTA - RUTA gold standard evaluation functionalitiy

Posted by Peter Klügl <pe...@averbis.com>.

Hi,


the problems is fixed now. Thank for reporting.

In order to use a fixed version, you need either to wait for the next
release (hopefully sometimes in Dec.) and update your Ruta Workbench, or
update to a snapshot version of the plugins.

Does one of these options work for you?


Besides the inital problem, the comparision of the annotation in these
views is overall limited. A workaround could be to inspect and compare
the features of the annotations in the CAS manually.


Best,


Peter


Am 01.12.2020 um 14:32 schrieb Marti Quixal:
> Dear Listers,
>
> I am not sure this is the right forum to ask this question, but I will
> try it out.
>
> We have been using RUTA (in eclipse) for a while now and were thinking
> of building a gold standard using a semi-automatized approach. We
> develop rules, we tag corpus, we assume most of it will be right, we
> develop again and hope what was correctly done keeps working and new
> rules are refined until they do their job.
>
> While doing this we have come up with a couple of issues, one of which
> I would like to solve.
>
> We have moved our files in our output folder to our test folder (the
> folder with the script name) and run the evaluation of the Annotation
> View using the green triangle.
>
> When we look at the TP, FP and FN counts we do not get a perfect much
> (100% FP, 0% FP and FN). That is surprising I guess. When I look into
> the file in the test folder particularly the FP window, I can see
> where supposedly the discrepancy is found, but not what the difference
> is between what was expected and what actually was tagged (I see just
> the word “error”). Is there a way to get a more specific information here?
>
> I attach two screen captures in case this helps a bit.
>
> Thanks a lot for your time in advance.
>
> Best regards,
> Martí
>
> PS I have no experience in Java and about the basics in
> object-oriented programming. I am doing this as an end-user of RUTA in
> that sense.
>
>
>
> --
> Martí Quixal, PhD
> AB Schulpsychologie / Seminar für Sprachwissenschaft
> Eberhard Karls Universität Tübingen
>
> Web:
> - http://www.sfs.uni-tuebingen.de/~quixal/
> <http://www.sfs.uni-tuebingen.de/~quixal/>
> - https://uni-tuebingen.de/en/158882 <https://uni-tuebingen.de/en/158882>
>
> DigBinDiff: 
> - https://uni-tuebingen.de/en/169776 <https://uni-tuebingen.de/en/169776>
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA - RUTA gold standard evaluation functionalitiy

Posted by Peter Klügl <pe...@averbis.com>.

Hi Martí,


yes, this is right place. I'll take a look and get back to you tomorrow.


Best,


Peter


Am 01.12.2020 um 14:32 schrieb Marti Quixal:
> Dear Listers,
>
> I am not sure this is the right forum to ask this question, but I will
> try it out.
>
> We have been using RUTA (in eclipse) for a while now and were thinking
> of building a gold standard using a semi-automatized approach. We
> develop rules, we tag corpus, we assume most of it will be right, we
> develop again and hope what was correctly done keeps working and new
> rules are refined until they do their job.
>
> While doing this we have come up with a couple of issues, one of which
> I would like to solve.
>
> We have moved our files in our output folder to our test folder (the
> folder with the script name) and run the evaluation of the Annotation
> View using the green triangle.
>
> When we look at the TP, FP and FN counts we do not get a perfect much
> (100% FP, 0% FP and FN). That is surprising I guess. When I look into
> the file in the test folder particularly the FP window, I can see
> where supposedly the discrepancy is found, but not what the difference
> is between what was expected and what actually was tagged (I see just
> the word “error”). Is there a way to get a more specific information here?
>
> I attach two screen captures in case this helps a bit.
>
> Thanks a lot for your time in advance.
>
> Best regards,
> Martí
>
> PS I have no experience in Java and about the basics in
> object-oriented programming. I am doing this as an end-user of RUTA in
> that sense.
>
>
>
> --
> Martí Quixal, PhD
> AB Schulpsychologie / Seminar für Sprachwissenschaft
> Eberhard Karls Universität Tübingen
>
> Web:
> - http://www.sfs.uni-tuebingen.de/~quixal/
> <http://www.sfs.uni-tuebingen.de/~quixal/>
> - https://uni-tuebingen.de/en/158882 <https://uni-tuebingen.de/en/158882>
>
> DigBinDiff: 
> - https://uni-tuebingen.de/en/169776 <https://uni-tuebingen.de/en/169776>
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Re: RUTA - RUTA gold standard evaluation functionalitiy

Posted by Peter Klügl <pe...@averbis.com>.

Hi,


which Ruta version do you use?


The "error" is a problem visualizing the compared annotations. Most
likely, the feature value for the compared annotation is emtpy.

I was able to reproduce it and I am investigating it further.


Which evaluator do you use? If you haven't changed the default, then you
only evaluate the offsets (begin/end) and not the feature values (as I
suppose you want to evaluate that)

Do you restrict the types you want to evaluate?


Best,


Peter


Am 01.12.2020 um 14:32 schrieb Marti Quixal:
> Dear Listers,
>
> I am not sure this is the right forum to ask this question, but I will
> try it out.
>
> We have been using RUTA (in eclipse) for a while now and were thinking
> of building a gold standard using a semi-automatized approach. We
> develop rules, we tag corpus, we assume most of it will be right, we
> develop again and hope what was correctly done keeps working and new
> rules are refined until they do their job.
>
> While doing this we have come up with a couple of issues, one of which
> I would like to solve.
>
> We have moved our files in our output folder to our test folder (the
> folder with the script name) and run the evaluation of the Annotation
> View using the green triangle.
>
> When we look at the TP, FP and FN counts we do not get a perfect much
> (100% FP, 0% FP and FN). That is surprising I guess. When I look into
> the file in the test folder particularly the FP window, I can see
> where supposedly the discrepancy is found, but not what the difference
> is between what was expected and what actually was tagged (I see just
> the word “error”). Is there a way to get a more specific information here?
>
> I attach two screen captures in case this helps a bit.
>
> Thanks a lot for your time in advance.
>
> Best regards,
> Martí
>
> PS I have no experience in Java and about the basics in
> object-oriented programming. I am doing this as an end-user of RUTA in
> that sense.
>
>
>
> --
> Martí Quixal, PhD
> AB Schulpsychologie / Seminar für Sprachwissenschaft
> Eberhard Karls Universität Tübingen
>
> Web:
> - http://www.sfs.uni-tuebingen.de/~quixal/
> <http://www.sfs.uni-tuebingen.de/~quixal/>
> - https://uni-tuebingen.de/en/158882 <https://uni-tuebingen.de/en/158882>
>
> DigBinDiff: 
> - https://uni-tuebingen.de/en/169776 <https://uni-tuebingen.de/en/169776>
>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó