You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Carlos A <na...@gmail.com> on 2015/12/25 07:15:02 UTC

Sentences without punctuation

Hello all,

Is there any better way to separate sentences, that have NO punctuation,
with OpenNLP?

The sentence parser will not work in some cases.

In other words, I would like to be able to separate phrases, do some sort
of Sentence Boundary Segmentation Disambiguation on text that are
transcripts which have no punctuation. And then, once sentences are
separated, add the punctuation with a period or a question mark if the
sentence starts as a question.

Something like using the chunker so that I can determine the sentences
based on their NP VP, NP VP NP, and so on.

Thank you.

C.

Re: Sentences without punctuation

Posted by James Kosin <ja...@gmail.com>.
Good luck.  Only issue I'd be worried about is that English sentence 
structure can be very complex and even with a well trained chunker (POS) 
parser, you may still end up with a large number of false positive 
sentences.

On 12/27/2015 1:27 PM, Carlos A wrote:
> Thank you James. I only work with English. I think that training is not the
> way to go, instead I am going through sentence structure which is much more
> intelligent than using some sort of statistical method.
>
> On Sat, Dec 26, 2015 at 9:00 PM, James Kosin <ja...@gmail.com> wrote:
>
>> Carlos,
>>
>> It is possible to train a sentence detector to separate sentences;
>> however, you will have to provide your own training set.  For the training
>> set you wouldn't have any punctuation and each sentence would be on a
>> separate line.
>> Be warned, you will need a lot of training data in this case due to the
>> absence of the punctuation.
>>
>> The harder part will be getting a model to add the proper punctuation.  In
>> English we have the keywords of:  How, When, Where, Who, What... to help
>> determine questions.  Other languages use other keys to denote questions,
>> statements, and expressions in a sentence.
>>
>> Hopefully, you don't have to work with English; because, most cases it
>> isn't easy to determine sentence boundaries based on the grammar or word
>> composition alone.  English is very bad about that.
>>
>> Good Luck, it sounds like you have an interesting problem.
>>
>> James Kosin
>>
>>
>> On 12/25/2015 1:15 AM, Carlos A wrote:
>>
>>> Hello all,
>>>
>>> Is there any better way to separate sentences, that have NO punctuation,
>>> with OpenNLP?
>>>
>>> The sentence parser will not work in some cases.
>>>
>>> In other words, I would like to be able to separate phrases, do some sort
>>> of Sentence Boundary Segmentation Disambiguation on text that are
>>> transcripts which have no punctuation. And then, once sentences are
>>> separated, add the punctuation with a period or a question mark if the
>>> sentence starts as a question.
>>>
>>> Something like using the chunker so that I can determine the sentences
>>> based on their NP VP, NP VP NP, and so on.
>>>
>>> Thank you.
>>>
>>> C.
>>>
>>>


Re: Sentences without punctuation

Posted by Carlos A <na...@gmail.com>.
Thank you James. I only work with English. I think that training is not the
way to go, instead I am going through sentence structure which is much more
intelligent than using some sort of statistical method.

On Sat, Dec 26, 2015 at 9:00 PM, James Kosin <ja...@gmail.com> wrote:

> Carlos,
>
> It is possible to train a sentence detector to separate sentences;
> however, you will have to provide your own training set.  For the training
> set you wouldn't have any punctuation and each sentence would be on a
> separate line.
> Be warned, you will need a lot of training data in this case due to the
> absence of the punctuation.
>
> The harder part will be getting a model to add the proper punctuation.  In
> English we have the keywords of:  How, When, Where, Who, What... to help
> determine questions.  Other languages use other keys to denote questions,
> statements, and expressions in a sentence.
>
> Hopefully, you don't have to work with English; because, most cases it
> isn't easy to determine sentence boundaries based on the grammar or word
> composition alone.  English is very bad about that.
>
> Good Luck, it sounds like you have an interesting problem.
>
> James Kosin
>
>
> On 12/25/2015 1:15 AM, Carlos A wrote:
>
>> Hello all,
>>
>> Is there any better way to separate sentences, that have NO punctuation,
>> with OpenNLP?
>>
>> The sentence parser will not work in some cases.
>>
>> In other words, I would like to be able to separate phrases, do some sort
>> of Sentence Boundary Segmentation Disambiguation on text that are
>> transcripts which have no punctuation. And then, once sentences are
>> separated, add the punctuation with a period or a question mark if the
>> sentence starts as a question.
>>
>> Something like using the chunker so that I can determine the sentences
>> based on their NP VP, NP VP NP, and so on.
>>
>> Thank you.
>>
>> C.
>>
>>
>

Re: Sentences without punctuation

Posted by James Kosin <ja...@gmail.com>.
Carlos,

It is possible to train a sentence detector to separate sentences; 
however, you will have to provide your own training set.  For the 
training set you wouldn't have any punctuation and each sentence would 
be on a separate line.
Be warned, you will need a lot of training data in this case due to the 
absence of the punctuation.

The harder part will be getting a model to add the proper punctuation.  
In English we have the keywords of:  How, When, Where, Who, What... to 
help determine questions.  Other languages use other keys to denote 
questions, statements, and expressions in a sentence.

Hopefully, you don't have to work with English; because, most cases it 
isn't easy to determine sentence boundaries based on the grammar or word 
composition alone.  English is very bad about that.

Good Luck, it sounds like you have an interesting problem.

James Kosin

On 12/25/2015 1:15 AM, Carlos A wrote:
> Hello all,
>
> Is there any better way to separate sentences, that have NO punctuation,
> with OpenNLP?
>
> The sentence parser will not work in some cases.
>
> In other words, I would like to be able to separate phrases, do some sort
> of Sentence Boundary Segmentation Disambiguation on text that are
> transcripts which have no punctuation. And then, once sentences are
> separated, add the punctuation with a period or a question mark if the
> sentence starts as a question.
>
> Something like using the chunker so that I can determine the sentences
> based on their NP VP, NP VP NP, and so on.
>
> Thank you.
>
> C.
>