You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Yuan Luo <yu...@gmail.com> on 2011/11/28 00:26:51 UTC
sentence detector
Hi,
Does the team have plans to deal with bracketed text? For example
The sentence
An EGD on 10/24/06 showed mild antral erosions ( mild regeneration ,
nonspecific; no H. pylori ).
will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
Intuitively, it would be natural to separate bracketed texts from
affecting sentence breakers.
Thanks,
Yuan
Re: sentence detector
Posted by Alec Taylor <al...@gmail.com>.
Quick note:
If you implement this, ensure NLP analyses square-bracketed text [],
as that often contains whole sentences.
Probably the best solution would be to add in a condition that if the
word begins with a capital and ends in a full-stop, it is not a
sentence.
On Mon, Nov 28, 2011 at 10:26 AM, Yuan Luo <yu...@gmail.com> wrote:
> Hi,
> Does the team have plans to deal with bracketed text? For example
> The sentence
> An EGD on 10/24/06 showed mild antral erosions ( mild regeneration ,
> nonspecific; no H. pylori ).
> will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
> Intuitively, it would be natural to separate bracketed texts from affecting
> sentence breakers.
>
> Thanks,
> Yuan
Re: sentence detector
Posted by Jörn Kottmann <ko...@gmail.com>.
The sentence detector is learn able and the model from
the website was trained on news. I suggest that you take
data out of your domain and train a new model.
Jörn
On 11/28/11 12:26 AM, Yuan Luo wrote:
> Hi,
> Does the team have plans to deal with bracketed text? For example
> The sentence
> An EGD on 10/24/06 showed mild antral erosions ( mild regeneration ,
> nonspecific; no H. pylori ).
> will be split into two at "H." by the Opennlp 1.5.1 with 1.5.0 models.
> Intuitively, it would be natural to separate bracketed texts from
> affecting sentence breakers.
>
> Thanks,
> Yuan