You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Mark Giaconia (JIRA)" <ji...@apache.org> on 2013/10/03 01:54:43 UTC

[jira] [Commented] (OPENNLP-602) SentenceDetector should support new line as and end of sentence char

    [ https://issues.apache.org/jira/browse/OPENNLP-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784646#comment-13784646 ] 

Mark Giaconia commented on OPENNLP-602:
---------------------------------------

This is especially important when extracting text from Powerpoints and the like, where typically bullet points and such do not contain a full stop. I recommend making it optional (boolean overload, or a setter one can call after instantiation of the sentencedetectorME), because I have seen situations where this makes sense, but it's conditional and should be used wisely.
thoughts?


> SentenceDetector should support new line as and end of sentence char
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-602
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-602
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>    Affects Versions: tools-1.5.3
>            Reporter: Joern Kottmann
>            Assignee: Joern Kottmann
>            Priority: Minor
>
> The Sentence Detector should have support to consider new line chars as the end of a sentence. This will probably require special handling in the training code to assume that there is an new line char if any other eos is missing.



--
This message was sent by Atlassian JIRA
(v6.1#6144)