You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Koji Sekiguchi (JIRA)" <ji...@apache.org> on 2018/08/14 03:35:00 UTC

[jira] [Created] (OPENNLP-1214) use hash to avoid linear search in DefaultEndOfSentenceScanner

Koji Sekiguchi created OPENNLP-1214:
---------------------------------------

             Summary: use hash to avoid linear search in DefaultEndOfSentenceScanner
                 Key: OPENNLP-1214
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1214
             Project: OpenNLP
          Issue Type: Improvement
    Affects Versions: 1.9.0
            Reporter: Koji Sekiguchi
             Fix For: 1.9.1


When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to check if each characters in the sentence is one of eos characters. I think we'd better use HashSet to keep eosCharacters instead of char[].

In accordance with this replacement, I'd like to make getEndOfSentenceCharacters() deprecated because it returns char[] and nobody in OpenNLP calls it at present, and I'd like to add the equivalent method which returns Set<Character> of eos chars. Though it cannot keep the order of eos chars but I don't think it can be a problem anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)