You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/06/06 11:11:58 UTC

[jira] [Created] (OPENNLP-196) POS Tagger Sequence streams calls generateEvents in a loop

POS Tagger Sequence streams calls generateEvents in a loop 
-----------------------------------------------------------

                 Key: OPENNLP-196
                 URL: https://issues.apache.org/jira/browse/OPENNLP-196
             Project: OpenNLP
          Issue Type: Bug
          Components: POS Tagger
    Affects Versions: tools-1.5.1-incubating
            Reporter: Jörn Kottmann
            Assignee: Jörn Kottmann
            Priority: Trivial
             Fix For: tools-1.5.2-incubating


The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.

To fix this issue remove the loop around generateEvents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-196) POS Tagger Sequence streams calls generateEvents in a loop

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044774#comment-13044774 ] 

Jörn Kottmann commented on OPENNLP-196:
---------------------------------------

The fix reduces the total training time with 5 iterations from 24 minutes to 14 minutes. Pre-processing and post-processing is in both cases identical, therefore the fix shortens the training time by at least 50%.

To measure the performance enhancement more accurately more iterations are needed.

> POS Tagger Sequence streams calls generateEvents in a loop 
> -----------------------------------------------------------
>
>                 Key: OPENNLP-196
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-196
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: POS Tagger
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>            Priority: Trivial
>             Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-196) POS Tagger Sequence streams calls generateEvents in a loop

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044776#comment-13044776 ] 

Jörn Kottmann commented on OPENNLP-196:
---------------------------------------

First test without the fix:

Got 64691 sequences
Indexing events using cutoff of 5

	Computing event counts...  done. 1422335 events
	Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...  
done.
	Number of Event Tokens: 1422335
	    Number of Outcomes: 45
	  Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
  1:  . (1338465/1422335) 0.9410335821026692
  2:  . (1362083/1422335) 0.9576386716209613
  3:  . (1368995/1422335) 0.962498286268706
  4:  . (1373167/1422335) 0.9654314911747233
  5:  . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (4.927s)

Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin


real	24m35.059s
user	24m31.178s
sys	1m10.894s


Second test with the fix:

Got 64691 sequences
Indexing events using cutoff of 5

	Computing event counts...  done. 1422335 events
	Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...  
done.
	Number of Event Tokens: 1422335
	    Number of Outcomes: 45
	  Number of Predicates: 103444
Computing model parameters...
Performing 5 iterations.
  1:  . (1338465/1422335) 0.9410335821026692
  2:  . (1362083/1422335) 0.9576386716209613
  3:  . (1368995/1422335) 0.962498286268706
  4:  . (1373167/1422335) 0.9654314911747233
  5:  . (1376065/1422335) 0.9674689858577621
. (1381396/1422335) 0.971217048023145
...done.
Writing pos tagger model ... Compressed 103444 parameters to 74134
22966 outcome patterns
done (5.564s)

Wrote pos tagger model to
path: /Users/joern/dev/opennlp-apache/opennlp/opennlp-tools/en-pos.bin


real	14m34.409s
user	13m28.532s
sys	0m36.698s


> POS Tagger Sequence streams calls generateEvents in a loop 
> -----------------------------------------------------------
>
>                 Key: OPENNLP-196
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-196
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: POS Tagger
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>            Priority: Trivial
>             Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (OPENNLP-196) POS Tagger Sequence streams calls generateEvents in a loop

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann closed OPENNLP-196.
---------------------------------

    Resolution: Fixed

> POS Tagger Sequence streams calls generateEvents in a loop 
> -----------------------------------------------------------
>
>                 Key: OPENNLP-196
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-196
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: POS Tagger
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>            Priority: Trivial
>             Fix For: tools-1.5.2-incubating
>
>
> The POS Tagger Sequence Stream class the generateEvents in a loop, but one call is enough.
> To fix this issue remove the loop around generateEvents.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira