You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "william.colen@gmail.com" <wi...@gmail.com> on 2012/02/07 14:17:59 UTC

Re: Custom feature generators

Hi,

I would like to work on that now, passing a Factory class name to the CLI
tools and saving it to the model as a configuration.
Do you still think it is a good idea? Or we should find a better way to
load custom feature generator and custom sequence validators? I would like
to do it for SentenceDetector and POS Tagger for now.

Thanks,
William

On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 6/14/11 4:23 AM, william.colen@gmail.com wrote:
>
>> Hi,
>>
>> Currently we only have implemented custom feature generators that we can
>> pass from command line only for NameFinder, but it would be very nice to
>> have it for all tools.
>> The Thai sentence detector customization is nice and simple, but to do
>> something for other languages the user would need to branch the code. We
>> should allow users to pass a factory class name from command line. Maybe
>> we
>> could do it for every tool that doesn't use sequence feature generator.
>> Also
>> would be nice to save the factory class name to the model to make sure we
>> are using the same feature generator during runtime and evaluation.
>>
>> What do you think? Maybe you have thought a better solution for that.
>>
>
> The first approach OpenNLP come up with to customize the feature generation
> of a component is to simply pass in a context generator. Well, that does
> not
> really work with the new model packages and the command line.
> We never really came up with a solution to this problem or discussed it.
>
> William suggest that we should use a class name to load a factory class.
> And I think we then should also remove the support to pass in a context
> generator.
>
> I believe it is a good way of solving the issue, since the model can than
> be used
> by an code which integrates OpenNLP and has an additional jar on the
> classpath.
> That will for example work well with our UIMA integration.
>
> These models might not be well suited for distribution to a wider group of
> people
> since they always need the factory class which we cannot put inside the
> model because
> of security issues.
>
> For components where we need to adapt the feature generation to a language
> I still
> suggest that we continue to define default feature generation which is
> dependent on
> the language, as we already do for thai in the sentence detector.
>
> Well, I am not yet sure how it should be done for the parser, doccat and
> coref.
>
> Jörn
>

Re: Custom feature generators

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Hi, Jörn,

On Tue, Feb 7, 2012 at 12:56 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 2/7/12 3:35 PM, william.colen@gmail.com wrote:
>
>> And what about sequence validators? How to alternate from the default one?
>>
>
> Maybe we should make a default Factory which people can sub-class if they
> want
> to modify the sequence validator they create a different one. Anyway that
> could
> also be done via sub-classing the component itself (the way we are
> currently doing it)
> and then the Factory would only be responsible to instantiate the
> sub-classed component.
>
>
>
I like the option of sub-classing the component itself, it would be more
flexible. But what about the static train methods?

William

Re: Custom feature generators

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
> I would like to work on that now, passing a Factory class name to the CLI
> tools and saving it to the model as a configuration.
> Do you still think it is a good idea? Or we should find a better way to
> load custom feature generator and custom sequence validators? I would like
> to do it for SentenceDetector and POS Tagger for now.
>
+1


> A very important point is that we can reuse the code to instantiate a
> component
> over and over again without modifying it for a customization.
> This way all the models will work anywhere were OpenNLP is integrated and
> the extension jar files are on the classpath.
>

I like these two. I needed this myself a year or so ago, had to invent
workarounds.

Aliaksandr

Re: Custom feature generators

Posted by Jörn Kottmann <ko...@gmail.com>.
On 2/7/12 3:35 PM, william.colen@gmail.com wrote:
> And what about sequence validators? How to alternate from the default one?

Maybe we should make a default Factory which people can sub-class if 
they want
to modify the sequence validator they create a different one. Anyway 
that could
also be done via sub-classing the component itself (the way we are 
currently doing it)
and then the Factory would only be responsible to instantiate the 
sub-classed component.

> The factory should be used to load custom resources, like a different
> implementation of a dictionary, am I right?
>
Yes, but it could also be something else.

A very important point is that we can reuse the code to instantiate a 
component
over and over again without modifying it for a customization.
This way all the models will work anywhere were OpenNLP is integrated and
the extension jar files are on the classpath.

Jörn

Re: Custom feature generators

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
And what about sequence validators? How to alternate from the default one?

The factory should be used to load custom resources, like a different
implementation of a dictionary, am I right?

Thank you,
William

On Tue, Feb 7, 2012 at 11:57 AM, Joern Kottmann <ko...@gmail.com> wrote:

> Yes, lets see what we could do.
>
> The name finder already supports custom feature generation,
> the same feature generation code could be reused by the POS Tagger.
> This is actually already half done.
>
> One of the current limitations is that we cannot store "custom" resources
> in
> a model. If we specify some kind of Factory class it would be nice if it
> can help
> us to locate the Artifact Serializer for a custom resource.
>
> We could define one Factory class per component which is able to influence
> how this component is created from the model.
>
> What do you think?
>
> Jörn
>
> On Tue, Feb 7, 2012 at 2:17 PM, william.colen@gmail.com <
> william.colen@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to work on that now, passing a Factory class name to the CLI
> > tools and saving it to the model as a configuration.
> > Do you still think it is a good idea? Or we should find a better way to
> > load custom feature generator and custom sequence validators? I would
> like
> > to do it for SentenceDetector and POS Tagger for now.
> >
> > Thanks,
> > William
> >
> > On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <ko...@gmail.com>
> > wrote:
> >
> > > On 6/14/11 4:23 AM, william.colen@gmail.com wrote:
> > >
> > >> Hi,
> > >>
> > >> Currently we only have implemented custom feature generators that we
> can
> > >> pass from command line only for NameFinder, but it would be very nice
> to
> > >> have it for all tools.
> > >> The Thai sentence detector customization is nice and simple, but to do
> > >> something for other languages the user would need to branch the code.
> We
> > >> should allow users to pass a factory class name from command line.
> Maybe
> > >> we
> > >> could do it for every tool that doesn't use sequence feature
> generator.
> > >> Also
> > >> would be nice to save the factory class name to the model to make sure
> > we
> > >> are using the same feature generator during runtime and evaluation.
> > >>
> > >> What do you think? Maybe you have thought a better solution for that.
> > >>
> > >
> > > The first approach OpenNLP come up with to customize the feature
> > generation
> > > of a component is to simply pass in a context generator. Well, that
> does
> > > not
> > > really work with the new model packages and the command line.
> > > We never really came up with a solution to this problem or discussed
> it.
> > >
> > > William suggest that we should use a class name to load a factory
> class.
> > > And I think we then should also remove the support to pass in a context
> > > generator.
> > >
> > > I believe it is a good way of solving the issue, since the model can
> than
> > > be used
> > > by an code which integrates OpenNLP and has an additional jar on the
> > > classpath.
> > > That will for example work well with our UIMA integration.
> > >
> > > These models might not be well suited for distribution to a wider group
> > of
> > > people
> > > since they always need the factory class which we cannot put inside the
> > > model because
> > > of security issues.
> > >
> > > For components where we need to adapt the feature generation to a
> > language
> > > I still
> > > suggest that we continue to define default feature generation which is
> > > dependent on
> > > the language, as we already do for thai in the sentence detector.
> > >
> > > Well, I am not yet sure how it should be done for the parser, doccat
> and
> > > coref.
> > >
> > > Jörn
> > >
> >
>

Re: Custom feature generators

Posted by Joern Kottmann <ko...@gmail.com>.
Yes, lets see what we could do.

The name finder already supports custom feature generation,
the same feature generation code could be reused by the POS Tagger.
This is actually already half done.

One of the current limitations is that we cannot store "custom" resources
in
a model. If we specify some kind of Factory class it would be nice if it
can help
us to locate the Artifact Serializer for a custom resource.

We could define one Factory class per component which is able to influence
how this component is created from the model.

What do you think?

Jörn

On Tue, Feb 7, 2012 at 2:17 PM, william.colen@gmail.com <
william.colen@gmail.com> wrote:

> Hi,
>
> I would like to work on that now, passing a Factory class name to the CLI
> tools and saving it to the model as a configuration.
> Do you still think it is a good idea? Or we should find a better way to
> load custom feature generator and custom sequence validators? I would like
> to do it for SentenceDetector and POS Tagger for now.
>
> Thanks,
> William
>
> On Tue, Jun 21, 2011 at 11:58 AM, Jörn Kottmann <ko...@gmail.com>
> wrote:
>
> > On 6/14/11 4:23 AM, william.colen@gmail.com wrote:
> >
> >> Hi,
> >>
> >> Currently we only have implemented custom feature generators that we can
> >> pass from command line only for NameFinder, but it would be very nice to
> >> have it for all tools.
> >> The Thai sentence detector customization is nice and simple, but to do
> >> something for other languages the user would need to branch the code. We
> >> should allow users to pass a factory class name from command line. Maybe
> >> we
> >> could do it for every tool that doesn't use sequence feature generator.
> >> Also
> >> would be nice to save the factory class name to the model to make sure
> we
> >> are using the same feature generator during runtime and evaluation.
> >>
> >> What do you think? Maybe you have thought a better solution for that.
> >>
> >
> > The first approach OpenNLP come up with to customize the feature
> generation
> > of a component is to simply pass in a context generator. Well, that does
> > not
> > really work with the new model packages and the command line.
> > We never really came up with a solution to this problem or discussed it.
> >
> > William suggest that we should use a class name to load a factory class.
> > And I think we then should also remove the support to pass in a context
> > generator.
> >
> > I believe it is a good way of solving the issue, since the model can than
> > be used
> > by an code which integrates OpenNLP and has an additional jar on the
> > classpath.
> > That will for example work well with our UIMA integration.
> >
> > These models might not be well suited for distribution to a wider group
> of
> > people
> > since they always need the factory class which we cannot put inside the
> > model because
> > of security issues.
> >
> > For components where we need to adapt the feature generation to a
> language
> > I still
> > suggest that we continue to define default feature generation which is
> > dependent on
> > the language, as we already do for thai in the sentence detector.
> >
> > Well, I am not yet sure how it should be done for the parser, doccat and
> > coref.
> >
> > Jörn
> >
>