You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "william.colen@gmail.com" <wi...@gmail.com> on 2011/03/06 03:43:28 UTC

Chunk tests in 1.5.1-rc2 and 1.5.0

*Results of Chunker evaluation with public data*

*Component:* Chunker

*Data:* CONLL 2000

*Tester:* colen

*Tagging Perf 1.5.0:*

*Tagging Perf 1.5.1:*
Precision: 0.9255923572240226
Recall: 0.9220610430991112
F-Measure: 0.9238233255623465

*Comment:*
ChunkerEvaluator tool was not availabe in 1.5.0. To evaluate if something
changed I compared the output of 1.5.0 and 1.5.1 in a way similar to
"Compatibility Test with OpenNLP 1.5.0 SourceForge Models". The output
changed a little because of a bug fixed in 1.5.1 (missing trailing closing
bracket)

------------------------

*Component:* Chunker

*Data:* Arvores Deitadas

*Tester:* colen

*Tagging Perf 1.5.0:*

*Tagging Perf 1.5.1:*
Precision: 0.9406086044071353
Recall: 0.9364814040952779
F-Measure: 0.9385404669668097

*Comment:*
AD format for Chunker was not available for 1.5.0

=========
Test details
=========

Conll 2000
================================================================================
1.5.1
--------------------------------------------------------------------------------
$ time ./bin/opennlp ChunkerTrainerME -lang en -encoding UTF8 -iterations
100 -cutoff 5 -data train.txt -model en-chunker.bin
real    4m39.469s

--------
$ time ./bin/opennlp ChunkerEvaluator -encoding UTF8 -data test.txt -model
en-chunker.bin
Average: 161,7 sent/s
Total: 2013 sent
Runtime: 12.446s

Precision: 0.9255923572240226
Recall: 0.9220610430991112
F-Measure: 0.9238233255623465

real    0m13.356s

--------
$ time ./bin/opennlp ChunkerME en-chunker.bin < test_pos.txt > output.txt
Loading Chunker model ... done (0,650s)

Average: 167,3 sent/s
Total: 2012 sent
Runtime: 12.024s

real    0m12.906s


1.5.0
--------------------------------------------------------------------------------
$ time ./bin/opennlp ChunkerTrainerME -lang en -encoding UTF8 -iterations
100 -cutoff 5 -data ../apache-opennlp/train.txt -model en-chunker.bin
real    5m12.107s

--------
$ time ./bin/opennlp ChunkerME en-chunker.bin <
../apache-opennlp/test_pos.txt > output.txt
Loading Chunker model ... done (0,649s)

Average: 169,5 sent/s
Total: 2012 sent
Runtime: 11.869s

real    0m12.752s

Arvores Deitadas
================================================================================

1.5.1
--------------------------------------------------------------------------------
$ bin/opennlp ChunkerConverter ad -encoding ISO-8859-1 -data
../wrk/corpus/Bosque_CF_8.0.ad.txt > bosque-chunk
$ time ./bin/opennlp ChunkerTrainerME -lang pt -encoding UTF8 -iterations
100 -cutoff 5 -data bosque-chunk_train.txt -model pt-chunker.bin

real    0m56.778s

--------
$ time ./bin/opennlp ChunkerEvaluator -encoding UTF8 -data
bosque-chunk_test.txt -model pt-chunker.bin
Loading Chunker model ... done (0,245s)
Average: 145,5 sent/s
Total: 411 sent
Runtime: 2.825s

Precision: 0.9406086044071353
Recall: 0.9364814040952779
F-Measure: 0.9385404669668097

real    0m3.332s

Re: Chunk tests in 1.5.1-rc2 and 1.5.0

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
There was no specific Jira for this. I notice the issue while working on
https://issues.apache.org/jira/browse/OPENNLP-85.
Sorry, I think I should had open a specific Jira for it.

The old ChunkSample.getPhrasesAsSpanList() was not handling leftovers. I
modified the method taking the algorithm used in NameFinder, that was
already very well tested.


On Tue, Mar 15, 2011 at 12:31 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 3/6/11 3:43 AM, william.colen@gmail.com wrote:
>
>> *Comment:*
>>
>> ChunkerEvaluator tool was not availabe in 1.5.0. To evaluate if something
>> changed I compared the output of 1.5.0 and 1.5.1 in a way similar to
>> "Compatibility Test with OpenNLP 1.5.0 SourceForge Models". The output
>> changed a little because of a bug fixed in 1.5.1 (missing trailing closing
>> bracket)
>>
>
> I am a little behind, sorry.
>
> Can you point me a jira issue for the mentioned bug fix?
>
> Thanks,
> Jörn
>

Re: Chunk tests in 1.5.1-rc2 and 1.5.0

Posted by Jörn Kottmann <ko...@gmail.com>.
On 3/6/11 3:43 AM, william.colen@gmail.com wrote:
> *Comment:*
> ChunkerEvaluator tool was not availabe in 1.5.0. To evaluate if something
> changed I compared the output of 1.5.0 and 1.5.1 in a way similar to
> "Compatibility Test with OpenNLP 1.5.0 SourceForge Models". The output
> changed a little because of a bug fixed in 1.5.1 (missing trailing closing
> bracket)

I am a little behind, sorry.

Can you point me a jira issue for the mentioned bug fix?

Thanks,
Jörn