You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by György Chityil <gy...@gmail.com> on 2011/08/15 14:00:33 UTC
POSTagger handling of punctuation
Hello,
I noticed the POSTagger adds info to words next to a punctuation like this
questions?_NN
specifics,_NN
I guess it should be like
questions_NN?
specifics_NN,
Cheers,
Gyuri
Re: POSTagger handling of punctuation
Posted by Jörn Kottmann <ko...@gmail.com>.
On 8/15/11 2:47 PM, György Chityil wrote:
> It seems Dr. and the first double quotes are not tokenized. I guess Dr.
> should not be tokenized, while the double quotes are missed in this case.
You are getting this as a token: "Dr.
It is not a bug in our code, but rather a problem with the statistical
model, usually
such mistakes are fixed by adding more training data.
Jörn
Re: POSTagger handling of punctuation
Posted by György Chityil <gy...@gmail.com>.
Hello Jörn,
While testing I think I found some issues:
Here is a made up sample sentence I tried just now to test punctuation :
"
Dr. George wrote this book; it's his second publication after publishing
tons of books such as "500 tips" and "kick by kick" on top of the list.
"
Tokenizer gives this:
"
Dr. George wrote this book ; it 's his second publication after publishing
tons of books such as "500 tips " and "kick by kick " on top of the list .
"
It seems Dr. and the first double quotes are not tokenized. I guess Dr.
should not be tokenized, while the double quotes are missed in this case.
Cheers,
Gyuri
On Mon, Aug 15, 2011 at 2:20 PM, György Chityil <gy...@gmail.com>wrote:
> Thanks Jörn, I was unaware I am supposed to tokenize first :)
>
> Just fed the essay straight to POSTagger.
>
> Will try to tokenize it now, and report back.
>
>
> On Mon, Aug 15, 2011 at 2:15 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 8/15/11 2:00 PM, György Chityil wrote:
>>
>>> I noticed the POSTagger adds info to words next to a punctuation like
>>> this
>>>
>>> questions?_NN
>>> specifics,_NN
>>>
>>> I guess it should be like
>>>
>>> questions_NN?
>>> specifics_NN,
>>>
>>
>> It looks like you don't tokenize the input sentence correctly. Maybe you
>> can post
>> a little more context, then I can give you a better answer.
>>
>> Jörn
>>
>
>
>
> --
> Gyuri
> 274 44 98
> 06 30 5888 744
>
>
--
Gyuri
274 44 98
06 30 5888 744
Re: POSTagger handling of punctuation
Posted by György Chityil <gy...@gmail.com>.
Thanks Jörn, I was unaware I am supposed to tokenize first :)
Just fed the essay straight to POSTagger.
Will try to tokenize it now, and report back.
On Mon, Aug 15, 2011 at 2:15 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 8/15/11 2:00 PM, György Chityil wrote:
>
>> I noticed the POSTagger adds info to words next to a punctuation like this
>>
>> questions?_NN
>> specifics,_NN
>>
>> I guess it should be like
>>
>> questions_NN?
>> specifics_NN,
>>
>
> It looks like you don't tokenize the input sentence correctly. Maybe you
> can post
> a little more context, then I can give you a better answer.
>
> Jörn
>
--
Gyuri
274 44 98
06 30 5888 744
Re: POSTagger handling of punctuation
Posted by Jörn Kottmann <ko...@gmail.com>.
On 8/15/11 2:00 PM, György Chityil wrote:
> I noticed the POSTagger adds info to words next to a punctuation like this
>
> questions?_NN
> specifics,_NN
>
> I guess it should be like
>
> questions_NN?
> specifics_NN,
It looks like you don't tokenize the input sentence correctly. Maybe you
can post
a little more context, then I can give you a better answer.
Jörn