You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Ioan Barbulescu (JIRA)" <ji...@apache.org> on 2014/01/10 14:21:50 UTC
[jira] [Created] (OPENNLP-629) Third person singular verbs are
wrongly tagged as NNS instead of VBG
Ioan Barbulescu created OPENNLP-629:
---------------------------------------
Summary: Third person singular verbs are wrongly tagged as NNS instead of VBG
Key: OPENNLP-629
URL: https://issues.apache.org/jira/browse/OPENNLP-629
Project: OpenNLP
Issue Type: Bug
Components: Parser
Affects Versions: tools-1.5.3
Environment: Windows, java 8
Reporter: Ioan Barbulescu
Priority: Minor
Hi team
In many cases, verbs (third person, singular) are wrongly tagged as "NNS" instead of being tagged as VBG.
For example, for "the dog barks", we get the following parsing results:
-0.4873670715270621 = DT/0.9543650543068872 NN/0.9934635416261295 NNS/0.6478473815054814
-2.3176263333647076 = DT/0.9543650543068872 NN/0.9934635416261295 ./0.10389656993335769
-2.5438814602384756 = DT/0.9543650543068872 NN/0.9934635416261295 POS/0.08285903227408052
-3.1472424852917578 = DT/0.9543650543068872 NN/0.9934635416261295 VBG/0.045321418371414506
-3.3093737662787484 = DT/0.9543650543068872 NN/0.9934635416261295 RB/0.03853814197383135
-3.785492750117388 = DT/0.9543650543068872 NN/0.9934635416261295 IN/0.023939491699927738
-4.419574088556415 = DT/0.9543650543068872 NN/0.9934635416261295 NN/0.0126980460743554
-4.641227787202645 = DT/0.9543650543068872 NN/0.9934635416261295 WDT/0.010173582713485872
-4.645517470925252 = DT/0.9543650543068872 NN/0.9934635416261295 :/0.010130034731632277
-5.319832699567059 = DT/0.9543650543068872 NN/0.9934635416261295 ''/0.005161305328825757
(TOP (NP (DT the) (NN dog) (NNS barks)))
2.6064504449697834
(TOP (S (NP (DT the) (NN dog)) (NNS barks)))
1.9485980564427359
The biggest probability for the third term is found for NNS - by far - 0.64.
In comparison, VBG is found with a probability of only 0.04.
This parsing error manifests itself consistently, for most occurrences of the third person / singular verbs, regardless the context.
Am I missing something?
Maybe there is some supplementary configuration that controls this?
Can this be fixed only through code, or we need to patch our training data set?
Thank you so much.
BR,
Ioan
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)