You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Surendra <su...@aol.in> on 2013/02/08 18:46:20 UTC

Need en-sent.train file

Hi,
I am a post graduate student in computer science. I am working on sentence boundary detection of local Indian language. Could you please provide me the format of the train file and a sample file like en-sent.train which will be help full for me to create model.


regards,
Surendra H
R. V. College of Engineering, Bangalore, India



Re: Need en-sent.train file

Posted by Jörn Kottmann <ko...@gmail.com>.
On 02/08/2013 06:46 PM, Surendra wrote:
> Hi,
> I am a post graduate student in computer science. I am working on sentence boundary detection of local Indian language. Could you please provide me the format of the train file and a sample file like en-sent.train which will be help full for me to create model.
>
>

The sentence detector training data to train the en-sent.bin model is 
not Open Source. The easiest way to get training data is
to get a corpus and just extract the sentences for the training, there 
are a couple of freely or cheaply available corpora
which could be used. Some are already supported by OpenNLP, have a look 
at the manual.

Jörn

Re: Need en-sent.train file

Posted by James Kosin <ja...@gmail.com>.
The only requirement is that each sentence be on a separate line in the 
training file.
Don't try putting non-sentences in the training file.

On 2/8/2013 12:46 PM, Surendra wrote:
> Hi,
> I am a post graduate student in computer science. I am working on sentence boundary detection of local Indian language. Could you please provide me the format of the train file and a sample file like en-sent.train which will be help full for me to create model.
>
>
> regards,
> Surendra H
> R. V. College of Engineering, Bangalore, India
>
>
>