You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "mcsekhar@gmail.com" <mc...@gmail.com> on 2011/09/26 06:30:08 UTC

SentenceDetector not working on Linux

I run OpenNLP's SentenceDetector command line too on the example text given
at
http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.sentdetect.detection

But, it doesnt give the correct output mentioned at that same link, instead
it gives this output:
Pierre Vinken, 61 years old, will join the board as a nonexecutive director
Nov. 29.
Mr. Vinken is chairman of Elsevier N.V.,
the Dutch publishing group.
Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields
PLC,
was named a director of this British industrial conglomerate.

It created 5 sentences instead of 3.

I tried using the Java API of SentenceDetector, and that too gives incorrect
output.

A friend of mine ran the command line tool and used the Java API in Windows,
and it worked for him.
Hence, I am guessing this could be a Linux specific problem.

Thanks
Chandra

Re: SentenceDetector not working on Linux

Posted by Jörn Kottmann <ko...@gmail.com>.
On 9/26/11 6:30 AM, mcsekhar@gmail.com wrote:
> But, it doesnt give the correct output mentioned at that same link, instead
> it gives this output:
> Pierre Vinken, 61 years old, will join the board as a nonexecutive director
> Nov. 29.
> Mr. Vinken is chairman of Elsevier N.V.,
> the Dutch publishing group.
> Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields
> PLC,
> was named a director of this British industrial conglomerate.
>
> It created 5 sentences instead of 3.
>
> I tried using the Java API of SentenceDetector, and that too gives incorrect
> output.
>
> A friend of mine ran the command line tool and used the Java API in Windows,
> and it worked for him.
> Hence, I am guessing this could be a Linux specific problem.

It might have to do with the white spaces between the sentences. Maybe there
are some differences in the test you did and your friend did.

You can easily check that by trying out our current release candidate, 
because
we fixed the white space handling in the sentence detector there.

It can be downloaded from here:
http://people.apache.org/~joern/releases/opennlp-1.5.2-incubating/rc1/

Does your test with the API also behave different on Windows?

Jörn


Re: SentenceDetector not working on Linux

Posted by Muhammad Dhito Prihardhanto <mu...@gmail.com>.
I'm just trying to solve your problem...

I think maybe you just copied the text at that link without rearrange
the text again. The sentences in the text that you copied, they have
form:

Pierre Vinken, 61 years old, will join the board as a nonexecutive
director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., <newline>
the Dutch publishing group. Rudolph Agnew, 55 years old and former
chairman of Consolidated Gold Fields PLC, <newline>
was named a director of this British industrial conglomerate.

Note the text is separated by new lines, so the text will be having 3
lines, each will be representing a paragraph itself. You must combine
the lines by removing the newline first before you can use them as
input.

Hopefully you get what I mean, and can solve your problem.

--
Muhammad Dhito

On Mon, Sep 26, 2011 at 11:30 AM, mcsekhar@gmail.com <mc...@gmail.com> wrote:
> I run OpenNLP's SentenceDetector command line too on the example text given
> at
> http://incubator.apache.org/opennlp/documentation/manual/opennlp.html#tools.sentdetect.detection
>
> But, it doesnt give the correct output mentioned at that same link, instead
> it gives this output:
> Pierre Vinken, 61 years old, will join the board as a nonexecutive director
> Nov. 29.
> Mr. Vinken is chairman of Elsevier N.V.,
> the Dutch publishing group.
> Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields
> PLC,
> was named a director of this British industrial conglomerate.
>
> It created 5 sentences instead of 3.
>
> I tried using the Java API of SentenceDetector, and that too gives incorrect
> output.
>
> A friend of mine ran the command line tool and used the Java API in Windows,
> and it worked for him.
> Hence, I am guessing this could be a Linux specific problem.
>
> Thanks
> Chandra
>