You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Steven Owens (JIRA)" <ji...@apache.org> on 2015/10/03 23:07:26 UTC

[jira] [Commented] (OPENNLP-820) parser is mistagging quotes

    [ https://issues.apache.org/jira/browse/OPENNLP-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942454#comment-14942454 ] 

Steven Owens commented on OPENNLP-820:
--------------------------------------

As you can see the part of speech tagger doesn't have this problem. A simple method to fix this issue and others like it would seem to be if the POSTaggerFactory and ParserChunkerFactory used in training the parser were configurable (either by allowing them to be passed to train method or by storing them in some kind of ParserFactory (and move the HeadRules into that as well)) then retrain the parser model using POSTaggerFactory like the used for creating the POSTagger model. I can do the code refactoring work I just want a second opinion that this is the right solution.

> parser is mistagging quotes
> ---------------------------
>
>                 Key: OPENNLP-820
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-820
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.6.0
>            Reporter: Steven Owens
>              Labels: english
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> the parser is mistagging quotes (both single and double) with the default English model. I notice most on opening quotes but it happens to closing quotes. 
> ex. (TOP (NP (NP-S-NP (NN "))(ADVP-C-NP (RB Here))(. ?)(. ")))  both double quotes should be labeled ''(two single quotes).
> same sentence labeled with the part of speech tagger using the default English model: "__`` Here_RB ?_. "_''



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)