You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "william.colen@gmail.com" <wi...@gmail.com> on 2012/02/25 23:57:35 UTC
Confusion matrix report for POS Tagger evaluators
Hi,
I implemented a new EvaluationMonitor for the POS Tagger. It generates
a confusion
matrix <http://en.wikipedia.org/wiki/Confusion_matrix> for each token that
was not tagged properly.
Example output (Portuguese):
...
Accuracy for [que]: 91,34%
1316 ocurrencies. Confusion matrix (line: reference; column: predicted):
| conj-s | pron-indp | adv | pron-det || % Accu ||
conj-s |> 537 <| 40 | 0 | 0 || 93,07% ||
pron-indp | 59 |> 661 <| 0 | 0 || 91,81% ||
adv | 2 | 12 |> 4 <| 0 || 22,22% ||
pron-det | 0 | 1 | 0 |> 0 <|| 0% ||
Accuracy for [o]: 98,48%
3949 ocurrencies. Confusion matrix (line: reference; column: predicted):
| art | pron-det | pron-pers | , || % Accu ||
art |> 3857 <| 4 | 0 | 1 || 99,87% ||
pron-det | 36 |> 24 <| 0 | 0 || 40% ||
pron-pers | 19 | 0 |> 8 <| 0 || 29,63% ||
, | 0 | 0 | 0 |> 0 <|| 0% ||
Accuracy for [a]: 96%
4395 ocurrencies. Confusion matrix (line: reference; column: predicted):
| art | prp | pron-pers | pron-det || % Accu ||
art |> 3291 <| 54 | 0 | 0 || 98,39% ||
prp | 107 |> 922 <| 0 | 0 || 89,6% ||
pron-pers | 4 | 0 |> 4 <| 0 || 50% ||
pron-det | 11 | 0 | 0 |> 2 <|| 15,38% ||
...
Do you think it is interesting to make this report available?
I would add it to the CLI and it would be activated by an new argument that
pass in an output file for the report.
Thank you,
William
Re: Confusion matrix report for POS Tagger evaluators
Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Thank you for the feedback.
I don't know if the report I have in mind for the POS Tagger would apply
for the DocCat. I attached an example output to the Jira:
https://issues.apache.org/jira/browse/OPENNLP-449
On Sun, Feb 26, 2012 at 8:36 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> +1 also needed for doccat.
>
> Maybe it can be created by a class which could also
> be used for doccat.
>
> Jörn
>
>
> On 02/26/2012 03:13 AM, Jason Baldridge wrote:
>
>> +1 Fine-grained error analysis FTW!
>>
>> On Sat, Feb 25, 2012 at 4:57 PM, william.colen@gmail.com<
>> william.colen@gmail.com> wrote:
>>
>> Hi,
>>>
>>> I implemented a new EvaluationMonitor for the POS Tagger. It generates
>>> a confusion
>>> matrix<http://en.wikipedia.**org/wiki/Confusion_matrix<http://en.wikipedia.org/wiki/Confusion_matrix>>
>>> for each token that
>>> was not tagged properly.
>>>
>>> Example output (Portuguese):
>>>
>>> ...
>>> Accuracy for [que]: 91,34%
>>> 1316 ocurrencies. Confusion matrix (line: reference; column: predicted):
>>> | conj-s | pron-indp | adv | pron-det || % Accu ||
>>> conj-s |> 537<| 40 | 0 | 0 || 93,07% ||
>>> pron-indp | 59 |> 661<| 0 | 0 || 91,81% ||
>>> adv | 2 | 12 |> 4<| 0 || 22,22% ||
>>> pron-det | 0 | 1 | 0 |> 0<|| 0% ||
>>>
>>> Accuracy for [o]: 98,48%
>>> 3949 ocurrencies. Confusion matrix (line: reference; column: predicted):
>>> | art | pron-det | pron-pers | , || % Accu ||
>>> art |> 3857<| 4 | 0 | 1 || 99,87% ||
>>> pron-det | 36 |> 24<| 0 | 0 || 40% ||
>>> pron-pers | 19 | 0 |> 8<| 0 || 29,63% ||
>>> , | 0 | 0 | 0 |> 0<|| 0% ||
>>>
>>> Accuracy for [a]: 96%
>>> 4395 ocurrencies. Confusion matrix (line: reference; column: predicted):
>>> | art | prp | pron-pers | pron-det || % Accu ||
>>> art |> 3291<| 54 | 0 | 0 || 98,39% ||
>>> prp | 107 |> 922<| 0 | 0 || 89,6% ||
>>> pron-pers | 4 | 0 |> 4<| 0 || 50% ||
>>> pron-det | 11 | 0 | 0 |> 2<|| 15,38% ||
>>> ...
>>>
>>> Do you think it is interesting to make this report available?
>>> I would add it to the CLI and it would be activated by an new argument
>>> that
>>> pass in an output file for the report.
>>>
>>> Thank you,
>>> William
>>>
>>>
>>
>>
>
Re: Confusion matrix report for POS Tagger evaluators
Posted by Jörn Kottmann <ko...@gmail.com>.
+1 also needed for doccat.
Maybe it can be created by a class which could also
be used for doccat.
Jörn
On 02/26/2012 03:13 AM, Jason Baldridge wrote:
> +1 Fine-grained error analysis FTW!
>
> On Sat, Feb 25, 2012 at 4:57 PM, william.colen@gmail.com<
> william.colen@gmail.com> wrote:
>
>> Hi,
>>
>> I implemented a new EvaluationMonitor for the POS Tagger. It generates
>> a confusion
>> matrix<http://en.wikipedia.org/wiki/Confusion_matrix> for each token that
>> was not tagged properly.
>>
>> Example output (Portuguese):
>>
>> ...
>> Accuracy for [que]: 91,34%
>> 1316 ocurrencies. Confusion matrix (line: reference; column: predicted):
>> | conj-s | pron-indp | adv | pron-det || % Accu ||
>> conj-s |> 537<| 40 | 0 | 0 || 93,07% ||
>> pron-indp | 59 |> 661<| 0 | 0 || 91,81% ||
>> adv | 2 | 12 |> 4<| 0 || 22,22% ||
>> pron-det | 0 | 1 | 0 |> 0<|| 0% ||
>>
>> Accuracy for [o]: 98,48%
>> 3949 ocurrencies. Confusion matrix (line: reference; column: predicted):
>> | art | pron-det | pron-pers | , || % Accu ||
>> art |> 3857<| 4 | 0 | 1 || 99,87% ||
>> pron-det | 36 |> 24<| 0 | 0 || 40% ||
>> pron-pers | 19 | 0 |> 8<| 0 || 29,63% ||
>> , | 0 | 0 | 0 |> 0<|| 0% ||
>>
>> Accuracy for [a]: 96%
>> 4395 ocurrencies. Confusion matrix (line: reference; column: predicted):
>> | art | prp | pron-pers | pron-det || % Accu ||
>> art |> 3291<| 54 | 0 | 0 || 98,39% ||
>> prp | 107 |> 922<| 0 | 0 || 89,6% ||
>> pron-pers | 4 | 0 |> 4<| 0 || 50% ||
>> pron-det | 11 | 0 | 0 |> 2<|| 15,38% ||
>> ...
>>
>> Do you think it is interesting to make this report available?
>> I would add it to the CLI and it would be activated by an new argument that
>> pass in an output file for the report.
>>
>> Thank you,
>> William
>>
>
>
Re: Confusion matrix report for POS Tagger evaluators
Posted by Jason Baldridge <ja...@gmail.com>.
+1 Fine-grained error analysis FTW!
On Sat, Feb 25, 2012 at 4:57 PM, william.colen@gmail.com <
william.colen@gmail.com> wrote:
> Hi,
>
> I implemented a new EvaluationMonitor for the POS Tagger. It generates
> a confusion
> matrix <http://en.wikipedia.org/wiki/Confusion_matrix> for each token that
> was not tagged properly.
>
> Example output (Portuguese):
>
> ...
> Accuracy for [que]: 91,34%
> 1316 ocurrencies. Confusion matrix (line: reference; column: predicted):
> | conj-s | pron-indp | adv | pron-det || % Accu ||
> conj-s |> 537 <| 40 | 0 | 0 || 93,07% ||
> pron-indp | 59 |> 661 <| 0 | 0 || 91,81% ||
> adv | 2 | 12 |> 4 <| 0 || 22,22% ||
> pron-det | 0 | 1 | 0 |> 0 <|| 0% ||
>
> Accuracy for [o]: 98,48%
> 3949 ocurrencies. Confusion matrix (line: reference; column: predicted):
> | art | pron-det | pron-pers | , || % Accu ||
> art |> 3857 <| 4 | 0 | 1 || 99,87% ||
> pron-det | 36 |> 24 <| 0 | 0 || 40% ||
> pron-pers | 19 | 0 |> 8 <| 0 || 29,63% ||
> , | 0 | 0 | 0 |> 0 <|| 0% ||
>
> Accuracy for [a]: 96%
> 4395 ocurrencies. Confusion matrix (line: reference; column: predicted):
> | art | prp | pron-pers | pron-det || % Accu ||
> art |> 3291 <| 54 | 0 | 0 || 98,39% ||
> prp | 107 |> 922 <| 0 | 0 || 89,6% ||
> pron-pers | 4 | 0 |> 4 <| 0 || 50% ||
> pron-det | 11 | 0 | 0 |> 2 <|| 15,38% ||
> ...
>
> Do you think it is interesting to make this report available?
> I would add it to the CLI and it would be activated by an new argument that
> pass in an output file for the report.
>
> Thank you,
> William
>
--
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge