You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2014/01/06 17:33:55 UTC

[jira] [Commented] (OPENNLP-116) Define low level Classifier API which only works on ordered int features

    [ https://issues.apache.org/jira/browse/OPENNLP-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863084#comment-13863084 ] 

Joern Kottmann commented on OPENNLP-116:
----------------------------------------

Many machine learning libraries only work on vectors. Integrating those would be easier if OpenNLP itself takes care to map string features to int features.

> Define low level Classifier API which only works on ordered int features
> ------------------------------------------------------------------------
>
>                 Key: OPENNLP-116
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-116
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Maxent
>            Reporter: Joern Kottmann
>
> The maxent/perceptron code currently performs a mapping from String features to low level int features. Most of the code is clearly separated between these two different features, but the separation is not complete. There should be a clearly separated API for 
> dealing with high level features and low level features. The API should also contain support to map high level features to low level features.
> Goal of the separation is to allow also non-string features to be mapped to the low level int features, non string features could be hash int features, or hash long features. Or a different representation of a string e.g. UTF-8 bytes.
> In previous discussions it turned out that having both levels of API are valuable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)