You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2019/06/07 12:58:00 UTC

[jira] [Commented] (OPENNLP-1266) Limit normalization regexes

    [ https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858605#comment-16858605 ] 

Tim Allison commented on OPENNLP-1266:
--------------------------------------

For a string of just 'a's of various lengths:

||Length||Time (ms)||
|1000|36|
|2000|92|
|3000|97|
|4000|253|
|5000|363|
|6000|492|
|7000|768|
|8000|724|
|9000|990|
|10000|1192|
|11000|1375|
|12000|1967|
|13000|2104|
|14000|2190|
|15000|2402|
|16000|2992|
|17000|3338|
|18000|3865|
|19000|3888|
|20000|4270|
|21000|4722|
|22000|5731|
|23000|7238|
|24000|6235|
|25000|6984|
|26000|7613|
|27000|7794|
|28000|8981|
|29000|9028|
|30000|10086|
|31000|10740|
|32000|10908|
|33000|12710|

> Limit normalization regexes
> ---------------------------
>
>                 Key: OPENNLP-1266
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1266
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> Several of the normalizer regexes are not bounded.  In rare cases, this can cause eye-opening performance costs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)