You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2013/06/04 10:31:20 UTC

[jira] [Commented] (OPENNLP-581) Add Pluggable Machine Learning support

    [ https://issues.apache.org/jira/browse/OPENNLP-581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674167#comment-13674167 ] 

Joern Kottmann commented on OPENNLP-581:
----------------------------------------

The maxent code is now merged into OpenNLP Tools. I spent some time to replace all the usages of AbstractModel with MaxentModel (will commit soon) and stumbled upon the issue on how to serialize/deserialize a statistical model inside the model package.

To store or load a resource the model needs to use the correct serializer, currently we have a hard coded list of serializers which can be used,
and its possible to extend this list by creating a custom factory.

Anyway the user should not leverage a custom factory to use an external machine learning library, so we need to come up with a new solution.

I see the following options:
- Use the above proposed MachineLearningFactory to register the serializers (similar to the current factory)
- Define a special interface which reveals the serializer for a given resource and use it to store a map of resource names and serializer names inside the manifiest.properties

Any opinions? I think the second option is more elegant and might be handy for other use cases too.

                
> Add Pluggable Machine Learning support
> --------------------------------------
>
>                 Key: OPENNLP-581
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-581
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Joern Kottmann
>
> The OpenNLP Tools can currently only use the classifiers inside the Maxent library. It should be possible to plugin 3rd party machine learning libraries which can be integrated as seamlessly as the Maxent library.
> To achieve this two these tasks need to be solved:
> - Define a MachineLearningFactory which is capable of instantiating a Trainer and Classifer based on a given parameter properties file. The Algorithm name could be the name of the factory to use. Additional the code in OpenNLP Tools need to be refactored to use the factory interface instead of the TrainUtil.
>  
> - Refactor the OpenNLP Tools to use an interface instead of the AbstractModel the interface can be identical to the current MaxentModel with additional support for serialization.
> - To avoid an interface layer between OpenNLP Tools and Maxent the maxent classes should be moved to opennlp.tools.ml.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira