You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Yuan Luo <yu...@gmail.com> on 2011/11/28 00:26:51 UTC

sentence detector

Hi,
Does the team have plans to deal with bracketed text? For example
The sentence
An EGD on 10/24/06 showed mild antral erosions ( mild regeneration , 
nonspecific; no H. pylori ).
will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
Intuitively, it would be natural to separate bracketed texts from 
affecting sentence breakers.

Thanks,
Yuan

Re: sentence detector

Posted by Alec Taylor <al...@gmail.com>.
Quick note:

If you implement this, ensure NLP analyses square-bracketed text [],
as that often contains whole sentences.

Probably the best solution would be to add in a condition that if the
word begins with a capital and ends in a full-stop, it is not a
sentence.

On Mon, Nov 28, 2011 at 10:26 AM, Yuan Luo <yu...@gmail.com> wrote:
> Hi,
> Does the team have plans to deal with bracketed text? For example
> The sentence
> An EGD on 10/24/06 showed mild antral erosions ( mild regeneration ,
> nonspecific; no H. pylori ).
> will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
> Intuitively, it would be natural to separate bracketed texts from affecting
> sentence breakers.
>
> Thanks,
> Yuan

Re: sentence detector

Posted by Jörn Kottmann <ko...@gmail.com>.
The sentence detector is learn able and the model from
the website was trained on news. I suggest that you take
data out of your domain and train a new model.

Jörn

On 11/28/11 12:26 AM, Yuan Luo wrote:
> Hi,
> Does the team have plans to deal with bracketed text? For example
> The sentence
> An EGD on 10/24/06 showed mild antral erosions ( mild regeneration , 
> nonspecific; no H. pylori ).
> will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
> Intuitively, it would be natural to separate bracketed texts from 
> affecting sentence breakers.
>
> Thanks,
> Yuan