You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Miao Chen <vi...@gmail.com> on 2010/12/08 05:34:44 UTC

Asking a question about

Hello,

I am working on a NLP project and using OpenNLP for preprocessing. When
using OpenNLP sentence detector to get sentences out of text, we'd like to
get some confidence score of the detected sentences, namely the probability
of an identified sentence being a real sentence. I found from the API there
is a method called getSentenceProbabilities() in the SentenceDetectorME
class, which seems to provide the sentence probability I need. However I got
a problem when using this method and I will illustrate using an example. My
input text for sentence detector is:
"2.1.4 About JAVA package"

The sentence detector returns me two sentences: "2.1.4" and "About JAVA
package" (which I can accept :)
But the getSentenceProbabilities() returns me only one probability (which is
0.9924560093213692), and I don't know which "sentence" this probability is
for. This method is supposed to return an array of probabilities, but in
this case only one probability is returned.

Can anyone tell me how to explain this or how to get probabilities of the
identified sentences? I'd really appreciate any of your help. Thank you so
much!

Best,
Miao

-- 
Miao Chen

Ph.D. Candidate
School of Information Studies
Syracuse University

Re: Asking a question about

Posted by Jörn Kottmann <ko...@gmail.com>.
On 12/8/10 5:34 AM, Miao Chen wrote:
> The sentence detector returns me two sentences: "2.1.4" and "About JAVA
> package" (which I can accept:)
> But the getSentenceProbabilities() returns me only one probability (which is
> 0.9924560093213692), and I don't know which "sentence" this probability is
> for. This method is supposed to return an array of probabilities, but in
> this case only one probability is returned.

I tried to reproduce your problem, but the code seems to work. In this 
case the sentence
detector gives you an array with two elements back, right ? The second 
probability
is just "1.0".

Can you confirm that? If not, which version did you try?

Thanks,
Jörn


Re: Asking a question about

Posted by Jörn Kottmann <ko...@gmail.com>.
On 12/13/10 3:03 AM, Miao Chen wrote:
> Hi Jörn,
>
> Sorry for the late reply. Yes, when I added a dot at the end of the input
> string, it outputs two sentences and a probability array of two elements.
> Thank you so much for answering this!

It really seems to be a bug. We added it to our issue management
system:
https://issues.apache.org/jira/browse/OPENNLP-8

Jörn

Re: Asking a question about

Posted by Miao Chen <vi...@gmail.com>.
Hi Jörn,

Sorry for the late reply. Yes, when I added a dot at the end of the input
string, it outputs two sentences and a probability array of two elements.
Thank you so much for answering this!

Best,
Miao

On Thu, Dec 9, 2010 at 7:10 AM, Joern Kottmann <ko...@gmail.com> wrote:

> Hello,
>
> it seems to be a bug which only appears if the last sentence
> is not terminated with an end-of-sentence char, can you confirm that ?
> See if its correct if you attach a dot to your last sentence.
>
> Would be nice if you can open a bug report.
>
> Jörn
>
>
> On Wed, Dec 8, 2010 at 5:34 AM, Miao Chen <vi...@gmail.com> wrote:
>
> > Hello,
> >
> > I am working on a NLP project and using OpenNLP for preprocessing. When
> > using OpenNLP sentence detector to get sentences out of text, we'd like
> to
> > get some confidence score of the detected sentences, namely the
> probability
> > of an identified sentence being a real sentence. I found from the API
> there
> > is a method called getSentenceProbabilities() in the SentenceDetectorME
> > class, which seems to provide the sentence probability I need. However I
> > got
> > a problem when using this method and I will illustrate using an example.
> My
> > input text for sentence detector is:
> > "2.1.4 About JAVA package"
> >
> > The sentence detector returns me two sentences: "2.1.4" and "About JAVA
> > package" (which I can accept :)
> > But the getSentenceProbabilities() returns me only one probability (which
> > is
> > 0.9924560093213692), and I don't know which "sentence" this probability
> is
> > for. This method is supposed to return an array of probabilities, but in
> > this case only one probability is returned.
> >
> > Can anyone tell me how to explain this or how to get probabilities of the
> > identified sentences? I'd really appreciate any of your help. Thank you
> so
> > much!
> >
> > Best,
> > Miao
> >
> > --
> > Miao Chen
> >
> > Ph.D. Candidate
> > School of Information Studies
> > Syracuse University
> >
>

Re: Asking a question about

Posted by Joern Kottmann <ko...@gmail.com>.
Hello,

it seems to be a bug which only appears if the last sentence
is not terminated with an end-of-sentence char, can you confirm that ?
See if its correct if you attach a dot to your last sentence.

Would be nice if you can open a bug report.

Jörn


On Wed, Dec 8, 2010 at 5:34 AM, Miao Chen <vi...@gmail.com> wrote:

> Hello,
>
> I am working on a NLP project and using OpenNLP for preprocessing. When
> using OpenNLP sentence detector to get sentences out of text, we'd like to
> get some confidence score of the detected sentences, namely the probability
> of an identified sentence being a real sentence. I found from the API there
> is a method called getSentenceProbabilities() in the SentenceDetectorME
> class, which seems to provide the sentence probability I need. However I
> got
> a problem when using this method and I will illustrate using an example. My
> input text for sentence detector is:
> "2.1.4 About JAVA package"
>
> The sentence detector returns me two sentences: "2.1.4" and "About JAVA
> package" (which I can accept :)
> But the getSentenceProbabilities() returns me only one probability (which
> is
> 0.9924560093213692), and I don't know which "sentence" this probability is
> for. This method is supposed to return an array of probabilities, but in
> this case only one probability is returned.
>
> Can anyone tell me how to explain this or how to get probabilities of the
> identified sentences? I'd really appreciate any of your help. Thank you so
> much!
>
> Best,
> Miao
>
> --
> Miao Chen
>
> Ph.D. Candidate
> School of Information Studies
> Syracuse University
>