You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2013/05/30 16:53:51 UTC

Pluggable Machine Learning support

Hi all,

we spoke about it here and there already, to ensure that OpenNLP can 
stay competitive with other NLP libraries I am proposing to make the 
machine learning pluggable.

The extensions should not make it harder to use OpenNLP, if a user loads 
a model OpenNLP should be capable of setting up everything by itself 
without forcing the user to write custom integration code based on the 
ml implementation.
We solved this problem already with the extension mechanism, we build to 
support the customization of our components, I suggest that we reuse 
this extension mechanism to load a ml implementation. To use a custom ml 
implementation the user has to specify the class name of the factory in 
the Algorithm field of the params file. The params file is available 
during training and tagging time.

Most components in the tools package use the maxent library to do 
classification. The Java interfaces for this are currently located in 
the maxent package, to be able to swap the implementation the interfaces 
should be defined inside the tools package. To make things easier I 
propose to move the maxent and perceptron implemention as well.

Through the code base we use the AbstractModel, thats a bit unlucky 
because the only reason for this is the lack of model serialization 
support in the MaxentModel interface, a serialization method should be 
added to it, and maybe renamed to ClassificationModel. This will
break backward compatibility in non-standard use cases.

To be able to test the extension mechanism I suggest that we implement 
an addon which integrates liblinear and the Apache Mahout classifiers.

There are still a few deprecated 1.4 constructors and methods in OpenNLP 
which directly reference interfaces and classes in the maxent library,
these need to be removed, to be able to move the interfaces to the tools 
package.

Any opinions?

Jörn

Re: [External] Re: Pluggable Machine Learning support

Posted by Jörn Kottmann <ko...@gmail.com>.

In most of the code the interfaces are located at the component level 
package
and in some components we have sub-packages for different implementations.

Jörn

On 06/02/2013 01:58 AM, Giaconia, Mark [USA] wrote:
> I am still becoming familiar with the way the project is internally structured, but I typically like to separate frameworks from implementations, so perhaps a framework package that holds factories and interfaces and the like, and another for implementations?
>
> opennlp.tools.ml.framework
> opennlp.tools.ml.impls
>
> Let me know if I can help
>
>
> Mark Giaconia
>
>
> -----Original Message-----
> From: Samik Raychaudhuri [mailto:samikr@gmail.com]
> Sent: Friday, May 31, 2013 5:39 PM
> To: dev@opennlp.apache.org
> Subject: [External] Re: Pluggable Machine Learning support
>
> Yep, supporting the move to a new package/namespace.
>
> On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
>> big +1!
>>
>> Tommaso
>>
>>
>> 2013/5/31 William Colen <wi...@gmail.com>
>>
>>> I don't see any issue. People that uses Maxent directly would need to
>>> change how they use it, but that is OK for a major release.
>>>
>>>
>>>
>>>
>>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>>
>>>> Are there any objections to move the maxent/perceptron classes to an
>>>> opennlp.tools.ml package as part of this issue? Moving the things
>>>> would avoid a second interface layer and probably make using OpenNLP
>>>> Tools a bit easier, because then we are down to a single jar.
>>>>
>>>> Jörn
>>>>
>>>>
>>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>>
>>>>> +1 to add pluggable machine learning algorithms
>>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>>
>>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>>
>>>>>
>>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann
>>>>> <ko...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>    Hi all,
>>>>>> we spoke about it here and there already, to ensure that OpenNLP
>>>>>> can
>>> stay
>>>>>> competitive with other NLP libraries I am proposing to make the
>>>>>> machine learning pluggable.
>>>>>>
>>>>>> The extensions should not make it harder to use OpenNLP, if a user
>>> loads
>>>>>> a
>>>>>> model OpenNLP should be capable of setting up everything by itself
>>>>>> without forcing the user to write custom integration code based on
>>>>>> the ml implementation.
>>>>>> We solved this problem already with the extension mechanism, we
>>>>>> build
>>> to
>>>>>> support the customization of our components, I suggest that we
>>>>>> reuse
>>> this
>>>>>> extension mechanism to load a ml implementation. To use a custom
>>>>>> ml implementation the user has to specify the class name of the
>>>>>> factory in the Algorithm field of the params file. The params file
>>>>>> is available during training and tagging time.
>>>>>>
>>>>>> Most components in the tools package use the maxent library to do
>>>>>> classification. The Java interfaces for this are currently located
>>>>>> in
>>> the
>>>>>> maxent package, to be able to swap the implementation the
>>>>>> interfaces should be defined inside the tools package. To make
>>>>>> things easier I propose to move the maxent and perceptron
>>>>>> implemention as well.
>>>>>>
>>>>>> Through the code base we use the AbstractModel, thats a bit
>>>>>> unlucky because the only reason for this is the lack of model
>>>>>> serialization support in the MaxentModel interface, a
>>>>>> serialization method should be added to it, and maybe renamed to
>>>>>> ClassificationModel. This will break backward compatibility in
>>>>>> non-standard use cases.
>>>>>>
>>>>>> To be able to test the extension mechanism I suggest that we
>>>>>> implement
>>> an
>>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>>
>>>>>> There are still a few deprecated 1.4 constructors and methods in
>>> OpenNLP
>>>>>> which directly reference interfaces and classes in the maxent
>>>>>> library, these need to be removed, to be able to move the
>>>>>> interfaces to the
>>> tools
>>>>>> package.
>>>>>>
>>>>>> Any opinions?
>>>>>>
>>>>>> Jörn
>>>>>>
>>>>>>

RE: [External] Re: Pluggable Machine Learning support

Posted by "Giaconia, Mark [USA]" <Gi...@bah.com>.

I am still becoming familiar with the way the project is internally structured, but I typically like to separate frameworks from implementations, so perhaps a framework package that holds factories and interfaces and the like, and another for implementations?

opennlp.tools.ml.framework
opennlp.tools.ml.impls

Let me know if I can help


Mark Giaconia 


-----Original Message-----
From: Samik Raychaudhuri [mailto:samikr@gmail.com] 
Sent: Friday, May 31, 2013 5:39 PM
To: dev@opennlp.apache.org
Subject: [External] Re: Pluggable Machine Learning support

Yep, supporting the move to a new package/namespace.

On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
> big +1!
>
> Tommaso
>
>
> 2013/5/31 William Colen <wi...@gmail.com>
>
>> I don't see any issue. People that uses Maxent directly would need to 
>> change how they use it, but that is OK for a major release.
>>
>>
>>
>>
>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>
>>> Are there any objections to move the maxent/perceptron classes to an 
>>> opennlp.tools.ml package as part of this issue? Moving the things 
>>> would avoid a second interface layer and probably make using OpenNLP 
>>> Tools a bit easier, because then we are down to a single jar.
>>>
>>> Jörn
>>>
>>>
>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>
>>>> +1 to add pluggable machine learning algorithms
>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>
>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann 
>>>> <ko...@gmail.com>
>>>> wrote:
>>>>
>>>>   Hi all,
>>>>> we spoke about it here and there already, to ensure that OpenNLP 
>>>>> can
>> stay
>>>>> competitive with other NLP libraries I am proposing to make the 
>>>>> machine learning pluggable.
>>>>>
>>>>> The extensions should not make it harder to use OpenNLP, if a user
>> loads
>>>>> a
>>>>> model OpenNLP should be capable of setting up everything by itself 
>>>>> without forcing the user to write custom integration code based on 
>>>>> the ml implementation.
>>>>> We solved this problem already with the extension mechanism, we 
>>>>> build
>> to
>>>>> support the customization of our components, I suggest that we 
>>>>> reuse
>> this
>>>>> extension mechanism to load a ml implementation. To use a custom 
>>>>> ml implementation the user has to specify the class name of the 
>>>>> factory in the Algorithm field of the params file. The params file 
>>>>> is available during training and tagging time.
>>>>>
>>>>> Most components in the tools package use the maxent library to do 
>>>>> classification. The Java interfaces for this are currently located 
>>>>> in
>> the
>>>>> maxent package, to be able to swap the implementation the 
>>>>> interfaces should be defined inside the tools package. To make 
>>>>> things easier I propose to move the maxent and perceptron 
>>>>> implemention as well.
>>>>>
>>>>> Through the code base we use the AbstractModel, thats a bit 
>>>>> unlucky because the only reason for this is the lack of model 
>>>>> serialization support in the MaxentModel interface, a 
>>>>> serialization method should be added to it, and maybe renamed to 
>>>>> ClassificationModel. This will break backward compatibility in 
>>>>> non-standard use cases.
>>>>>
>>>>> To be able to test the extension mechanism I suggest that we 
>>>>> implement
>> an
>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>
>>>>> There are still a few deprecated 1.4 constructors and methods in
>> OpenNLP
>>>>> which directly reference interfaces and classes in the maxent 
>>>>> library, these need to be removed, to be able to move the 
>>>>> interfaces to the
>> tools
>>>>> package.
>>>>>
>>>>> Any opinions?
>>>>>
>>>>> Jörn
>>>>>
>>>>>

Re: Pluggable Machine Learning support

Posted by Samik Raychaudhuri <sa...@gmail.com>.

Yep, supporting the move to a new package/namespace.

On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
> big +1!
>
> Tommaso
>
>
> 2013/5/31 William Colen <wi...@gmail.com>
>
>> I don't see any issue. People that uses Maxent directly would need to
>> change how they use it, but that is OK for a major release.
>>
>>
>>
>>
>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>
>>> Are there any objections to move the maxent/perceptron classes to an
>>> opennlp.tools.ml
>>> package as part of this issue? Moving the things would avoid a second
>>> interface layer and
>>> probably make using OpenNLP Tools a bit easier, because then we are down
>>> to a single jar.
>>>
>>> Jörn
>>>
>>>
>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>
>>>> +1 to add pluggable machine learning algorithms
>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>
>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <ko...@gmail.com>
>>>> wrote:
>>>>
>>>>   Hi all,
>>>>> we spoke about it here and there already, to ensure that OpenNLP can
>> stay
>>>>> competitive with other NLP libraries I am proposing to make the machine
>>>>> learning pluggable.
>>>>>
>>>>> The extensions should not make it harder to use OpenNLP, if a user
>> loads
>>>>> a
>>>>> model OpenNLP should be capable of setting up everything by itself
>>>>> without
>>>>> forcing the user to write custom integration code based on the ml
>>>>> implementation.
>>>>> We solved this problem already with the extension mechanism, we build
>> to
>>>>> support the customization of our components, I suggest that we reuse
>> this
>>>>> extension mechanism to load a ml implementation. To use a custom ml
>>>>> implementation the user has to specify the class name of the factory in
>>>>> the
>>>>> Algorithm field of the params file. The params file is available during
>>>>> training and tagging time.
>>>>>
>>>>> Most components in the tools package use the maxent library to do
>>>>> classification. The Java interfaces for this are currently located in
>> the
>>>>> maxent package, to be able to swap the implementation the interfaces
>>>>> should
>>>>> be defined inside the tools package. To make things easier I propose to
>>>>> move the maxent and perceptron implemention as well.
>>>>>
>>>>> Through the code base we use the AbstractModel, thats a bit unlucky
>>>>> because the only reason for this is the lack of model serialization
>>>>> support
>>>>> in the MaxentModel interface, a serialization method should be added to
>>>>> it,
>>>>> and maybe renamed to ClassificationModel. This will
>>>>> break backward compatibility in non-standard use cases.
>>>>>
>>>>> To be able to test the extension mechanism I suggest that we implement
>> an
>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>
>>>>> There are still a few deprecated 1.4 constructors and methods in
>> OpenNLP
>>>>> which directly reference interfaces and classes in the maxent library,
>>>>> these need to be removed, to be able to move the interfaces to the
>> tools
>>>>> package.
>>>>>
>>>>> Any opinions?
>>>>>
>>>>> Jörn
>>>>>
>>>>>

Re: Pluggable Machine Learning support

Posted by Tommaso Teofili <to...@gmail.com>.

big +1!

Tommaso


2013/5/31 William Colen <wi...@gmail.com>

> I don't see any issue. People that uses Maxent directly would need to
> change how they use it, but that is OK for a major release.
>
>
>
>
> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
> > Are there any objections to move the maxent/perceptron classes to an
> > opennlp.tools.ml
> > package as part of this issue? Moving the things would avoid a second
> > interface layer and
> > probably make using OpenNLP Tools a bit easier, because then we are down
> > to a single jar.
> >
> > Jörn
> >
> >
> > On 05/30/2013 08:57 PM, William Colen wrote:
> >
> >> +1 to add pluggable machine learning algorithms
> >> +1 to improve the API and remove deprecated methods in 1.6.0
> >>
> >> You can assign related Jira issues to me and I will be glad to help.
> >>
> >>
> >> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <ko...@gmail.com>
> >> wrote:
> >>
> >>  Hi all,
> >>>
> >>> we spoke about it here and there already, to ensure that OpenNLP can
> stay
> >>> competitive with other NLP libraries I am proposing to make the machine
> >>> learning pluggable.
> >>>
> >>> The extensions should not make it harder to use OpenNLP, if a user
> loads
> >>> a
> >>> model OpenNLP should be capable of setting up everything by itself
> >>> without
> >>> forcing the user to write custom integration code based on the ml
> >>> implementation.
> >>> We solved this problem already with the extension mechanism, we build
> to
> >>> support the customization of our components, I suggest that we reuse
> this
> >>> extension mechanism to load a ml implementation. To use a custom ml
> >>> implementation the user has to specify the class name of the factory in
> >>> the
> >>> Algorithm field of the params file. The params file is available during
> >>> training and tagging time.
> >>>
> >>> Most components in the tools package use the maxent library to do
> >>> classification. The Java interfaces for this are currently located in
> the
> >>> maxent package, to be able to swap the implementation the interfaces
> >>> should
> >>> be defined inside the tools package. To make things easier I propose to
> >>> move the maxent and perceptron implemention as well.
> >>>
> >>> Through the code base we use the AbstractModel, thats a bit unlucky
> >>> because the only reason for this is the lack of model serialization
> >>> support
> >>> in the MaxentModel interface, a serialization method should be added to
> >>> it,
> >>> and maybe renamed to ClassificationModel. This will
> >>> break backward compatibility in non-standard use cases.
> >>>
> >>> To be able to test the extension mechanism I suggest that we implement
> an
> >>> addon which integrates liblinear and the Apache Mahout classifiers.
> >>>
> >>> There are still a few deprecated 1.4 constructors and methods in
> OpenNLP
> >>> which directly reference interfaces and classes in the maxent library,
> >>> these need to be removed, to be able to move the interfaces to the
> tools
> >>> package.
> >>>
> >>> Any opinions?
> >>>
> >>> Jörn
> >>>
> >>>
> >
>

Re: Pluggable Machine Learning support

Posted by William Colen <wi...@gmail.com>.

I don't see any issue. People that uses Maxent directly would need to
change how they use it, but that is OK for a major release.




On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Are there any objections to move the maxent/perceptron classes to an
> opennlp.tools.ml
> package as part of this issue? Moving the things would avoid a second
> interface layer and
> probably make using OpenNLP Tools a bit easier, because then we are down
> to a single jar.
>
> Jörn
>
>
> On 05/30/2013 08:57 PM, William Colen wrote:
>
>> +1 to add pluggable machine learning algorithms
>> +1 to improve the API and remove deprecated methods in 1.6.0
>>
>> You can assign related Jira issues to me and I will be glad to help.
>>
>>
>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <ko...@gmail.com>
>> wrote:
>>
>>  Hi all,
>>>
>>> we spoke about it here and there already, to ensure that OpenNLP can stay
>>> competitive with other NLP libraries I am proposing to make the machine
>>> learning pluggable.
>>>
>>> The extensions should not make it harder to use OpenNLP, if a user loads
>>> a
>>> model OpenNLP should be capable of setting up everything by itself
>>> without
>>> forcing the user to write custom integration code based on the ml
>>> implementation.
>>> We solved this problem already with the extension mechanism, we build to
>>> support the customization of our components, I suggest that we reuse this
>>> extension mechanism to load a ml implementation. To use a custom ml
>>> implementation the user has to specify the class name of the factory in
>>> the
>>> Algorithm field of the params file. The params file is available during
>>> training and tagging time.
>>>
>>> Most components in the tools package use the maxent library to do
>>> classification. The Java interfaces for this are currently located in the
>>> maxent package, to be able to swap the implementation the interfaces
>>> should
>>> be defined inside the tools package. To make things easier I propose to
>>> move the maxent and perceptron implemention as well.
>>>
>>> Through the code base we use the AbstractModel, thats a bit unlucky
>>> because the only reason for this is the lack of model serialization
>>> support
>>> in the MaxentModel interface, a serialization method should be added to
>>> it,
>>> and maybe renamed to ClassificationModel. This will
>>> break backward compatibility in non-standard use cases.
>>>
>>> To be able to test the extension mechanism I suggest that we implement an
>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>
>>> There are still a few deprecated 1.4 constructors and methods in OpenNLP
>>> which directly reference interfaces and classes in the maxent library,
>>> these need to be removed, to be able to move the interfaces to the tools
>>> package.
>>>
>>> Any opinions?
>>>
>>> Jörn
>>>
>>>
>

Re: Pluggable Machine Learning support

Posted by Jörn Kottmann <ko...@gmail.com>.

Are there any objections to move the maxent/perceptron classes to an 
opennlp.tools.ml
package as part of this issue? Moving the things would avoid a second 
interface layer and
probably make using OpenNLP Tools a bit easier, because then we are down 
to a single jar.

Jörn

On 05/30/2013 08:57 PM, William Colen wrote:
> +1 to add pluggable machine learning algorithms
> +1 to improve the API and remove deprecated methods in 1.6.0
>
> You can assign related Jira issues to me and I will be glad to help.
>
>
> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> Hi all,
>>
>> we spoke about it here and there already, to ensure that OpenNLP can stay
>> competitive with other NLP libraries I am proposing to make the machine
>> learning pluggable.
>>
>> The extensions should not make it harder to use OpenNLP, if a user loads a
>> model OpenNLP should be capable of setting up everything by itself without
>> forcing the user to write custom integration code based on the ml
>> implementation.
>> We solved this problem already with the extension mechanism, we build to
>> support the customization of our components, I suggest that we reuse this
>> extension mechanism to load a ml implementation. To use a custom ml
>> implementation the user has to specify the class name of the factory in the
>> Algorithm field of the params file. The params file is available during
>> training and tagging time.
>>
>> Most components in the tools package use the maxent library to do
>> classification. The Java interfaces for this are currently located in the
>> maxent package, to be able to swap the implementation the interfaces should
>> be defined inside the tools package. To make things easier I propose to
>> move the maxent and perceptron implemention as well.
>>
>> Through the code base we use the AbstractModel, thats a bit unlucky
>> because the only reason for this is the lack of model serialization support
>> in the MaxentModel interface, a serialization method should be added to it,
>> and maybe renamed to ClassificationModel. This will
>> break backward compatibility in non-standard use cases.
>>
>> To be able to test the extension mechanism I suggest that we implement an
>> addon which integrates liblinear and the Apache Mahout classifiers.
>>
>> There are still a few deprecated 1.4 constructors and methods in OpenNLP
>> which directly reference interfaces and classes in the maxent library,
>> these need to be removed, to be able to move the interfaces to the tools
>> package.
>>
>> Any opinions?
>>
>> Jörn
>>

Re: Pluggable Machine Learning support

Posted by William Colen <wi...@gmail.com>.

+1 to add pluggable machine learning algorithms
+1 to improve the API and remove deprecated methods in 1.6.0

You can assign related Jira issues to me and I will be glad to help.


On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hi all,
>
> we spoke about it here and there already, to ensure that OpenNLP can stay
> competitive with other NLP libraries I am proposing to make the machine
> learning pluggable.
>
> The extensions should not make it harder to use OpenNLP, if a user loads a
> model OpenNLP should be capable of setting up everything by itself without
> forcing the user to write custom integration code based on the ml
> implementation.
> We solved this problem already with the extension mechanism, we build to
> support the customization of our components, I suggest that we reuse this
> extension mechanism to load a ml implementation. To use a custom ml
> implementation the user has to specify the class name of the factory in the
> Algorithm field of the params file. The params file is available during
> training and tagging time.
>
> Most components in the tools package use the maxent library to do
> classification. The Java interfaces for this are currently located in the
> maxent package, to be able to swap the implementation the interfaces should
> be defined inside the tools package. To make things easier I propose to
> move the maxent and perceptron implemention as well.
>
> Through the code base we use the AbstractModel, thats a bit unlucky
> because the only reason for this is the lack of model serialization support
> in the MaxentModel interface, a serialization method should be added to it,
> and maybe renamed to ClassificationModel. This will
> break backward compatibility in non-standard use cases.
>
> To be able to test the extension mechanism I suggest that we implement an
> addon which integrates liblinear and the Apache Mahout classifiers.
>
> There are still a few deprecated 1.4 constructors and methods in OpenNLP
> which directly reference interfaces and classes in the maxent library,
> these need to be removed, to be able to move the interfaces to the tools
> package.
>
> Any opinions?
>
> Jörn
>