You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2014/01/06 17:33:55 UTC
[jira] [Commented] (OPENNLP-116) Define low level Classifier API
which only works on ordered int features
[ https://issues.apache.org/jira/browse/OPENNLP-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863084#comment-13863084 ]
Joern Kottmann commented on OPENNLP-116:
----------------------------------------
Many machine learning libraries only work on vectors. Integrating those would be easier if OpenNLP itself takes care to map string features to int features.
> Define low level Classifier API which only works on ordered int features
> ------------------------------------------------------------------------
>
> Key: OPENNLP-116
> URL: https://issues.apache.org/jira/browse/OPENNLP-116
> Project: OpenNLP
> Issue Type: Improvement
> Components: Maxent
> Reporter: Joern Kottmann
>
> The maxent/perceptron code currently performs a mapping from String features to low level int features. Most of the code is clearly separated between these two different features, but the separation is not complete. There should be a clearly separated API for
> dealing with high level features and low level features. The API should also contain support to map high level features to low level features.
> Goal of the separation is to allow also non-string features to be mapped to the low level int features, non string features could be hash int features, or hash long features. Or a different representation of a string e.g. UTF-8 bytes.
> In previous discussions it turned out that having both levels of API are valuable.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)