You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/29 13:11:00 UTC

[jira] [Commented] (OPENNLP-1266) Limit normalization regexes in UrlCharSequenceNormalizer

    [ https://issues.apache.org/jira/browse/OPENNLP-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17514077#comment-17514077 ] 

ASF GitHub Bot commented on OPENNLP-1266:
-----------------------------------------

jzonthemtn commented on pull request #355:
URL: https://github.com/apache/opennlp/pull/355#issuecomment-1081851392


   The previously referenced PR #399 was merged. Since this PR has been open for some time without updates I am going to close it. If there is a need to reopen it please feel free to do so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@opennlp.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Limit normalization regexes in UrlCharSequenceNormalizer
> --------------------------------------------------------
>
>                 Key: OPENNLP-1266
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1266
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>
> The {{MAIL_REGEX}} in UrlCharSequenceNormalizer is unbounded and requires backtracking. In rare cases, this can cause eye-opening performance costs.
>  
> I tested the other regexes in the other normalizers.  I could be wrong, but they don't appear to require backtracking, and there are no surprising performance costs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)