You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/06/06 11:11:58 UTC
[jira] [Created] (OPENNLP-196) POS Tagger Sequence streams calls
generateEvents in a loop
POS Tagger Sequence streams calls generateEvents in a loop
-----------------------------------------------------------
Key: OPENNLP-196
URL: https://issues.apache.org/jira/browse/OPENNLP-196
Project: OpenNLP
Issue Type: Bug
Components: POS Tagger
Affects Versions: tools-1.5.1-incubating
Reporter: Jörn Kottmann
Assignee: Jörn Kottmann
Priority: Trivial
Fix For: tools-1.5.2-incubating
The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
To fix this issue remove the loop around generateEvents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-196) POS Tagger Sequence streams calls
generateEvents in a loop
Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044774#comment-13044774 ]
Jörn Kottmann commented on OPENNLP-196:
---------------------------------------
The fix reduces the total training time with 5 iterations from 24 minutes to 14 minutes. Pre-processing and post-processing is in both cases identical, therefore the fix shortens the training time by at least 50%.
To measure the performance enhancement more accurately more iterations are needed.
> POS Tagger Sequence streams calls generateEvents in a loop
> -----------------------------------------------------------
>
> Key: OPENNLP-196
> URL: https://issues.apache.org/jira/browse/OPENNLP-196
> Project: OpenNLP
> Issue Type: Bug
> Components: POS Tagger
> Affects Versions: tools-1.5.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Trivial
> Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (OPENNLP-196) POS Tagger Sequence streams calls
generateEvents in a loop
Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044776#comment-13044776 ]
Jörn Kottmann commented on OPENNLP-196:
---------------------------------------
First test without the fix:
Got 64691 sequences
Indexing events using cutoff of 5
Computing event counts... done. 1422335 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1422335
Number of Outcomes: 45
Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
1: . (1338465/1422335) 0.9410335821026692
2: . (1362083/1422335) 0.9576386716209613
3: . (1368995/1422335) 0.962498286268706
4: . (1373167/1422335) 0.9654314911747233
5: . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (4.927s)
Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin
real 24m35.059s
user 24m31.178s
sys 1m10.894s
Second test with the fix:
Got 64691 sequences
Indexing events using cutoff of 5
Computing event counts... done. 1422335 events
Indexing... done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1422335
Number of Outcomes: 45
Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
1: . (1338465/1422335) 0.9410335821026692
2: . (1362083/1422335) 0.9576386716209613
3: . (1368995/1422335) 0.962498286268706
4: . (1373167/1422335) 0.9654314911747233
5: . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (5.564s)
Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin
real 14m34.409s
user 13m28.532s
sys 0m36.698s
> POS Tagger Sequence streams calls generateEvents in a loop
> -----------------------------------------------------------
>
> Key: OPENNLP-196
> URL: https://issues.apache.org/jira/browse/OPENNLP-196
> Project: OpenNLP
> Issue Type: Bug
> Components: POS Tagger
> Affects Versions: tools-1.5.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Trivial
> Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-196) POS Tagger Sequence streams calls
generateEvents in a loop
Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jörn Kottmann closed OPENNLP-196.
---------------------------------
Resolution: Fixed
> POS Tagger Sequence streams calls generateEvents in a loop
> -----------------------------------------------------------
>
> Key: OPENNLP-196
> URL: https://issues.apache.org/jira/browse/OPENNLP-196
> Project: OpenNLP
> Issue Type: Bug
> Components: POS Tagger
> Affects Versions: tools-1.5.1-incubating
> Reporter: Jörn Kottmann
> Assignee: Jörn Kottmann
> Priority: Trivial
> Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira