You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2014/01/25 18:34:38 UTC

[jira] [Reopened] (OPENNLP-602) SentenceDetector should support new line as and end of sentence char

     [ https://issues.apache.org/jira/browse/OPENNLP-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reopened OPENNLP-602:
------------------------------------


The following error was reported by Tim Miller from ctakes:

I'm running into one issue, it gets tripped up on sentences with
line-ending spaces.  I could easily remove them with a script but by
default they are in there. It happens when a sentence example ends:

...BILAT HEMATOMAS.  <LF>

(There is a period, then 2 spaces, then the line feed character.) I am
pretty sure this is the root because when I fix this example to be .<LF>
it gets tripped up in another place instead (with the same error). The
specific error I get is this:

> Exception in thread "main" java.lang.IllegalArgumentException: start
> index must not be larger than end index: start=8842, end=8839
>     at opennlp.tools.util.Span.<init>(Span.java:47)
>     at opennlp.tools.util.Span.<init>(Span.java:63)
>     at
> opennlp.tools.sentdetect.SentenceDetectorME.sentPosDetect(SentenceDetectorME.java:244)
>     at
> opennlp.tools.sentdetect.SentenceDetectorEvaluator.processSample(SentenceDetectorEvaluator.java:56)
>     at
> opennlp.tools.sentdetect.SentenceDetectorEvaluator.processSample(SentenceDetectorEvaluator.java:1)
>     at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:82)
>     at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:109)
>     at
> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:130)
>     at
> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
>     at opennlp.tools.cmdline.CLI.main(CLI.java:214)


> SentenceDetector should support new line as and end of sentence char
> --------------------------------------------------------------------
>
>                 Key: OPENNLP-602
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-602
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Sentence Detector
>    Affects Versions: tools-1.5.3
>            Reporter: Joern Kottmann
>            Assignee: Joern Kottmann
>            Priority: Minor
>             Fix For: 1.6.0
>
>
> The Sentence Detector should have support to consider new line chars as the end of a sentence. This will probably require special handling in the training code to assume that there is an new line char if any other eos is missing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)