You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by "Giaconia, Mark [USA]" <Gi...@bah.com> on 2013/06/02 01:58:12 UTC

RE: [External] Re: Pluggable Machine Learning support

I am still becoming familiar with the way the project is internally structured, but I typically like to separate frameworks from implementations, so perhaps a framework package that holds factories and interfaces and the like, and another for implementations?

opennlp.tools.ml.framework
opennlp.tools.ml.impls

Let me know if I can help


Mark Giaconia 


-----Original Message-----
From: Samik Raychaudhuri [mailto:samikr@gmail.com] 
Sent: Friday, May 31, 2013 5:39 PM
To: dev@opennlp.apache.org
Subject: [External] Re: Pluggable Machine Learning support

Yep, supporting the move to a new package/namespace.

On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
> big +1!
>
> Tommaso
>
>
> 2013/5/31 William Colen <wi...@gmail.com>
>
>> I don't see any issue. People that uses Maxent directly would need to 
>> change how they use it, but that is OK for a major release.
>>
>>
>>
>>
>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>
>>> Are there any objections to move the maxent/perceptron classes to an 
>>> opennlp.tools.ml package as part of this issue? Moving the things 
>>> would avoid a second interface layer and probably make using OpenNLP 
>>> Tools a bit easier, because then we are down to a single jar.
>>>
>>> Jörn
>>>
>>>
>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>
>>>> +1 to add pluggable machine learning algorithms
>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>
>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann 
>>>> <ko...@gmail.com>
>>>> wrote:
>>>>
>>>>   Hi all,
>>>>> we spoke about it here and there already, to ensure that OpenNLP 
>>>>> can
>> stay
>>>>> competitive with other NLP libraries I am proposing to make the 
>>>>> machine learning pluggable.
>>>>>
>>>>> The extensions should not make it harder to use OpenNLP, if a user
>> loads
>>>>> a
>>>>> model OpenNLP should be capable of setting up everything by itself 
>>>>> without forcing the user to write custom integration code based on 
>>>>> the ml implementation.
>>>>> We solved this problem already with the extension mechanism, we 
>>>>> build
>> to
>>>>> support the customization of our components, I suggest that we 
>>>>> reuse
>> this
>>>>> extension mechanism to load a ml implementation. To use a custom 
>>>>> ml implementation the user has to specify the class name of the 
>>>>> factory in the Algorithm field of the params file. The params file 
>>>>> is available during training and tagging time.
>>>>>
>>>>> Most components in the tools package use the maxent library to do 
>>>>> classification. The Java interfaces for this are currently located 
>>>>> in
>> the
>>>>> maxent package, to be able to swap the implementation the 
>>>>> interfaces should be defined inside the tools package. To make 
>>>>> things easier I propose to move the maxent and perceptron 
>>>>> implemention as well.
>>>>>
>>>>> Through the code base we use the AbstractModel, thats a bit 
>>>>> unlucky because the only reason for this is the lack of model 
>>>>> serialization support in the MaxentModel interface, a 
>>>>> serialization method should be added to it, and maybe renamed to 
>>>>> ClassificationModel. This will break backward compatibility in 
>>>>> non-standard use cases.
>>>>>
>>>>> To be able to test the extension mechanism I suggest that we 
>>>>> implement
>> an
>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>
>>>>> There are still a few deprecated 1.4 constructors and methods in
>> OpenNLP
>>>>> which directly reference interfaces and classes in the maxent 
>>>>> library, these need to be removed, to be able to move the 
>>>>> interfaces to the
>> tools
>>>>> package.
>>>>>
>>>>> Any opinions?
>>>>>
>>>>> Jörn
>>>>>
>>>>>

Re: [External] Re: Pluggable Machine Learning support

Posted by Jörn Kottmann <ko...@gmail.com>.

In most of the code the interfaces are located at the component level 
package
and in some components we have sub-packages for different implementations.

Jörn

On 06/02/2013 01:58 AM, Giaconia, Mark [USA] wrote:
> I am still becoming familiar with the way the project is internally structured, but I typically like to separate frameworks from implementations, so perhaps a framework package that holds factories and interfaces and the like, and another for implementations?
>
> opennlp.tools.ml.framework
> opennlp.tools.ml.impls
>
> Let me know if I can help
>
>
> Mark Giaconia
>
>
> -----Original Message-----
> From: Samik Raychaudhuri [mailto:samikr@gmail.com]
> Sent: Friday, May 31, 2013 5:39 PM
> To: dev@opennlp.apache.org
> Subject: [External] Re: Pluggable Machine Learning support
>
> Yep, supporting the move to a new package/namespace.
>
> On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
>> big +1!
>>
>> Tommaso
>>
>>
>> 2013/5/31 William Colen <wi...@gmail.com>
>>
>>> I don't see any issue. People that uses Maxent directly would need to
>>> change how they use it, but that is OK for a major release.
>>>
>>>
>>>
>>>
>>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>>
>>>> Are there any objections to move the maxent/perceptron classes to an
>>>> opennlp.tools.ml package as part of this issue? Moving the things
>>>> would avoid a second interface layer and probably make using OpenNLP
>>>> Tools a bit easier, because then we are down to a single jar.
>>>>
>>>> Jörn
>>>>
>>>>
>>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>>
>>>>> +1 to add pluggable machine learning algorithms
>>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>>
>>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>>
>>>>>
>>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann
>>>>> <ko...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>    Hi all,
>>>>>> we spoke about it here and there already, to ensure that OpenNLP
>>>>>> can
>>> stay
>>>>>> competitive with other NLP libraries I am proposing to make the
>>>>>> machine learning pluggable.
>>>>>>
>>>>>> The extensions should not make it harder to use OpenNLP, if a user
>>> loads
>>>>>> a
>>>>>> model OpenNLP should be capable of setting up everything by itself
>>>>>> without forcing the user to write custom integration code based on
>>>>>> the ml implementation.
>>>>>> We solved this problem already with the extension mechanism, we
>>>>>> build
>>> to
>>>>>> support the customization of our components, I suggest that we
>>>>>> reuse
>>> this
>>>>>> extension mechanism to load a ml implementation. To use a custom
>>>>>> ml implementation the user has to specify the class name of the
>>>>>> factory in the Algorithm field of the params file. The params file
>>>>>> is available during training and tagging time.
>>>>>>
>>>>>> Most components in the tools package use the maxent library to do
>>>>>> classification. The Java interfaces for this are currently located
>>>>>> in
>>> the
>>>>>> maxent package, to be able to swap the implementation the
>>>>>> interfaces should be defined inside the tools package. To make
>>>>>> things easier I propose to move the maxent and perceptron
>>>>>> implemention as well.
>>>>>>
>>>>>> Through the code base we use the AbstractModel, thats a bit
>>>>>> unlucky because the only reason for this is the lack of model
>>>>>> serialization support in the MaxentModel interface, a
>>>>>> serialization method should be added to it, and maybe renamed to
>>>>>> ClassificationModel. This will break backward compatibility in
>>>>>> non-standard use cases.
>>>>>>
>>>>>> To be able to test the extension mechanism I suggest that we
>>>>>> implement
>>> an
>>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>>
>>>>>> There are still a few deprecated 1.4 constructors and methods in
>>> OpenNLP
>>>>>> which directly reference interfaces and classes in the maxent
>>>>>> library, these need to be removed, to be able to move the
>>>>>> interfaces to the
>>> tools
>>>>>> package.
>>>>>>
>>>>>> Any opinions?
>>>>>>
>>>>>> Jörn
>>>>>>
>>>>>>