You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Rodrigo Agerri (JIRA)" <ji...@apache.org> on 2013/09/30 12:00:26 UTC

[jira] [Updated] (OPENNLP-582) Add lemmatizer functionality

     [ https://issues.apache.org/jira/browse/OPENNLP-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rodrigo Agerri updated OPENNLP-582:
-----------------------------------

    Attachment: lemmatizer-prelim.patch

Hi Jörn, 

I attach a first patch for the lemmatizer functionality. Right now I have only included the required classes for the API to work. As we talked some time ago by email, I have included a DictionaryLemmatizer interface  and 3 implementations of it to perform lemmatization: 

1. JWNL based. 
2. HashMap based (loads dictionary on RAM), uses en-lemmas.dict
3. Morfologik-based (binary method for large dictionaries), it uses english.dict. 

I have tested it in several tools to perform lemmatization for several languages and it works as expected. The only requirement is to provide the dictionaries in the required formats for a given language (except the JWNL one because its API works only for English; to be honest, I included this one because I had it already implemented, but I do not think it is that useful). 

I know that many other things need to be done before the inclusion of this package in the project for v1.6.0, but let me know first if the developers agree with the current structure before I carry on. 

If you agree, please point out what I need to do next (CLI issues, tests, etc.).  

Cheers, 

Rodrigo 

> Add lemmatizer functionality
> ----------------------------
>
>                 Key: OPENNLP-582
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-582
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: POS Tagger
>    Affects Versions: 1.6.0
>            Reporter: Rodrigo Agerri
>         Attachments: lemmatizer-prelim.patch
>
>
> Will add new functionality to perform dictionary based lemmatization. It will look up a word form and pos tag in a dictionary and produce the corresponding lemma. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)