You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Koji Sekiguchi (JIRA)" <ji...@apache.org> on 2018/08/14 03:35:00 UTC
[jira] [Created] (OPENNLP-1214) use hash to avoid linear search in
DefaultEndOfSentenceScanner
Koji Sekiguchi created OPENNLP-1214:
---------------------------------------
Summary: use hash to avoid linear search in DefaultEndOfSentenceScanner
Key: OPENNLP-1214
URL: https://issues.apache.org/jira/browse/OPENNLP-1214
Project: OpenNLP
Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Koji Sekiguchi
Fix For: 1.9.1
When DefaultEndOfSentenceScanner scans a sentence, it uses linear search to check if each characters in the sentence is one of eos characters. I think we'd better use HashSet to keep eosCharacters instead of char[].
In accordance with this replacement, I'd like to make getEndOfSentenceCharacters() deprecated because it returns char[] and nobody in OpenNLP calls it at present, and I'd like to add the equivalent method which returns Set<Character> of eos chars. Though it cannot keep the order of eos chars but I don't think it can be a problem anyway.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)