You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Łukasz Dróżdż (JIRA)" <ji...@apache.org> on 2015/04/16 17:42:59 UTC

[jira] [Commented] (OPENNLP-287) Extend POS Tagger documentation with more information about the tag dictionary

    [ https://issues.apache.org/jira/browse/OPENNLP-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498191#comment-14498191 ] 

Łukasz Dróżdż commented on OPENNLP-287:
---------------------------------------

Hi,

Here's my attempt at providing a sample POS dictionary file, as well as test code for programmatic usage, both in reading in and writing back the dictionary and using it to training a POS tagger. See the attached files for details.

The XML structure of a POS dictionary is:

<?xml version="1.0" encoding="UTF-8"?>
<dictionary>
  <entry tags="tag1 tag2">
    <token>token1</token>
  </entry>
  <entry tags="tag1">
    <token>token2</token>
  </entry>
</dictionary>

Hope that helps.

> Extend POS Tagger documentation with more information about the tag dictionary
> ------------------------------------------------------------------------------
>
>                 Key: OPENNLP-287
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-287
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Documentation, POS Tagger
>            Reporter: Joern Kottmann
>            Priority: Minor
>         Attachments: TaggerDictionaryTest.java, dictionary.xml, en-pos.train
>
>
> Extend the POS Tagger tag dictionary section as described in the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)