You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Ioan Barbulescu (JIRA)" <ji...@apache.org> on 2014/01/10 14:21:50 UTC

[jira] [Created] (OPENNLP-629) Third person singular verbs are wrongly tagged as NNS instead of VBG

Ioan Barbulescu created OPENNLP-629:
---------------------------------------

             Summary: Third person singular verbs are wrongly tagged as NNS instead of VBG
                 Key: OPENNLP-629
                 URL: https://issues.apache.org/jira/browse/OPENNLP-629
             Project: OpenNLP
          Issue Type: Bug
          Components: Parser
    Affects Versions: tools-1.5.3
         Environment: Windows, java 8
            Reporter: Ioan Barbulescu
            Priority: Minor


Hi team

In many cases, verbs (third person, singular) are wrongly tagged as "NNS" instead of being tagged as VBG.

For example, for "the dog barks", we get the following parsing results:

-0.4873670715270621 = DT/0.9543650543068872 NN/0.9934635416261295 NNS/0.6478473815054814 
-2.3176263333647076 = DT/0.9543650543068872 NN/0.9934635416261295 ./0.10389656993335769 
-2.5438814602384756 = DT/0.9543650543068872 NN/0.9934635416261295 POS/0.08285903227408052 
-3.1472424852917578 = DT/0.9543650543068872 NN/0.9934635416261295 VBG/0.045321418371414506 
-3.3093737662787484 = DT/0.9543650543068872 NN/0.9934635416261295 RB/0.03853814197383135 
-3.785492750117388 = DT/0.9543650543068872 NN/0.9934635416261295 IN/0.023939491699927738 
-4.419574088556415 = DT/0.9543650543068872 NN/0.9934635416261295 NN/0.0126980460743554 
-4.641227787202645 = DT/0.9543650543068872 NN/0.9934635416261295 WDT/0.010173582713485872 
-4.645517470925252 = DT/0.9543650543068872 NN/0.9934635416261295 :/0.010130034731632277 
-5.319832699567059 = DT/0.9543650543068872 NN/0.9934635416261295 ''/0.005161305328825757 
(TOP (NP (DT the) (NN dog) (NNS barks)))
2.6064504449697834
(TOP (S (NP (DT the) (NN dog)) (NNS barks)))
1.9485980564427359

The biggest probability for the third term is found for NNS - by far - 0.64.
In comparison, VBG is found with a probability of only 0.04.

This parsing error manifests itself consistently, for most occurrences of the third person / singular verbs, regardless the context.

Am I missing something? 
Maybe there is some supplementary configuration that controls this?

Can this be fixed only through code, or we need to patch our training data set?

Thank you so much.

BR,
Ioan



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)