You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/05/16 15:40:47 UTC

[jira] [Closed] (OPENNLP-172) Replace the regex token class feature generation with the fast string pattern implementation

     [ https://issues.apache.org/jira/browse/OPENNLP-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann closed OPENNLP-172.
---------------------------------

    Resolution: Fixed

> Replace the regex token class feature generation with the fast string pattern implementation
> --------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-172
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-172
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Name Finder
>    Affects Versions: tools-1.5.1-incubating
>            Reporter: Jörn Kottmann
>            Assignee: Jörn Kottmann
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> The token class feature is computed with the help of regular expression, the regular expressions are slower than the new fast token class feature method which uses the Character class to compute the token class.
> The old regular expression based token class feature computation should be replaced with the new fast token class method.
> The output of both methods is identical, so changing this will not break backward compatibility, but increase the throughput of the name finder by roughly 10%.
> A measurement on the Leipzig corpus with 300K sentences increased the throughput from 556 sent/s to 618 sent/s.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira