You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2016/11/08 11:57:59 UTC
[jira] [Commented] (OPENNLP-772) Japanese end of sentence fix
[ https://issues.apache.org/jira/browse/OPENNLP-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647343#comment-15647343 ]
Joern Kottmann commented on OPENNLP-772:
----------------------------------------
We are preparing the next release and this could easily pulled in. Do you still need this change? Did this turn out to work well for you?
> Japanese end of sentence fix
> ----------------------------
>
> Key: OPENNLP-772
> URL: https://issues.apache.org/jira/browse/OPENNLP-772
> Project: OpenNLP
> Issue Type: Improvement
> Components: Sentence Detector
> Affects Versions: tools-1.5.3
> Reporter: Bar Perach
> Labels: patch
> Fix For: 1.7.0
>
>
> the end of sentence characters list was wrong for japanese
> removed duplicate code
> Index: opennlp-tools/src/main/java/opennlp/tools/sentdetect/lang/Factory.java
> ===================================================================
> --- opennlp-tools/src/main/java/opennlp/tools/sentdetect/lang/Factory.java (revision 1678426)
> +++ opennlp-tools/src/main/java/opennlp/tools/sentdetect/lang/Factory.java (local)
> @@ -36,14 +36,12 @@
>
> public static final char[] thEosCharacters = new char[] { ' ','\n' };
>
> + // TODO add more sentence enders
> + public static final char[] jpEosCharacters = new char[] {'。', '!', '?'};
> +
> public EndOfSentenceScanner createEndOfSentenceScanner(String languageCode) {
> - if ("th".equals(languageCode)) {
> - return new DefaultEndOfSentenceScanner(new char[]{' ','\n'});
> - } else if("pt".equals(languageCode)) {
> - return new DefaultEndOfSentenceScanner(ptEosCharacters);
> - }
>
> - return new DefaultEndOfSentenceScanner(defaultEosCharacters);
> + return new DefaultEndOfSentenceScanner(getEOSCharacters(languageCode));
> }
>
> public EndOfSentenceScanner createEndOfSentenceScanner(
> @@ -76,6 +74,8 @@
> return thEosCharacters;
> } else if ("pt".equals(languageCode)) {
> return ptEosCharacters;
> + } else if ("jp".equals(languageCode)) {
> + return jpEosCharacters;
> }
>
> return defaultEosCharacters;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)