You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jeff Zemerick (Jira)" <ji...@apache.org> on 2022/01/19 13:58:00 UTC

[jira] [Updated] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

     [ https://issues.apache.org/jira/browse/OPENNLP-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Zemerick updated OPENNLP-1214:
-----------------------------------
    Fix Version/s:     (was: 1.9.4)

> use hash to avoid linear search in DefaultEndOfSentenceScanner
> --------------------------------------------------------------
>
>                 Key: OPENNLP-1214
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1214
>             Project: OpenNLP
>          Issue Type: Improvement
>    Affects Versions: 1.9.0
>            Reporter: Koji Sekiguchi
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>
> When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to check if each characters in the sentence is one of eos characters. I think we'd better use HashSet to keep eosCharacters instead of char[].
> In accordance with this replacement, I'd like to make getEndOfSentenceCharacters() deprecated because it returns char[] and nobody in OpenNLP calls it at present, and I'd like to add the equivalent method which returns Set<Character> of eos chars. Though it cannot keep the order of eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)