You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Markus M. Berg" <mm...@web.de> on 2017/01/18 12:38:25 UTC

NameFinderME with custom FeatureGenerators

Dear all,
 
I am trying to train the NameFinderME using a custom set of feature generators. However, I am not able to add the feature generators to the name finder.
 
Here is what I do:
As described in the documentation (https://opennlp.apache.org/documentation/1.7.0/manual/opennlp.html#tools.namefind.training.featuregen), I used the following code to set up the list of feature generators:
 
   AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
           new AdaptiveFeatureGenerator[]{
           new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2),
           new WindowFeatureGenerator(new TokenClassFeatureGenerator(true), 2, 2),
           new OutcomePriorFeatureGenerator(),
           new PreviousMapFeatureGenerator(),
           new BigramNameFeatureGenerator(),
           new SentenceFeatureGenerator(true, false),
           new BrownTokenFeatureGenerator(BrownCluster dictResource)
           });
 
Afterwards, in the documentation it is explained, that "the TokenNameFinderFactory allows to specify a custom feature generator".
However, I don't know how to do this, since there is no add-Method or any parameter of type AdaptiveFeatureGenerator in the constructor.
 
   TokenNameFinderFactory factory = new TokenNameFinderFactory()
   ... //how to add the FeatureGenerator?
   model = NameFinderME.train("en", "default", sampleStream, TrainingParameters.defaultParams(), factory);
 
In an older release of OpenNlp, it was possible to add the featureGenerators via the train-Method like this:
 
   train(String languageCode, String type, ObjectStream<NameSample> samples,
       TrainingParameters trainParams, AdaptiveFeatureGenerator generator, final Map<String, Object> resources)
 
But this not possible any longer. Can anybody describe the new way to implement this? An example would be great!
 
I only found this:

   public TokenNameFinderFactory(byte[] featureGeneratorBytes,
                              Map<String,Object> resources,
                              SequenceCodec<String> seqCodec)
 
But I don’t know what parameters to pass (why a byte array? SequenceCodec?)...
 
Any help is appreciated,
Thanks in advance!
 
Best,
Markus

Re: NameFinderME with custom FeatureGenerators

Posted by Joern Kottmann <ko...@gmail.com>.

Hello Markus,

the TokenNameFinderTrainerTool is part the cmdline package and not public
API. You should not use it. A good solution for you is for example
Files.readAllBytes.
Otherwise thats how you should do it. And we should look into adding more
constructors the the TokenNameFinderFactory to make this a bit nicer for
you.

B and C are not possible with the API we have currently, but you can take
the model apart yourself and look at the contents.

HTH,
Jörn


On Thu, Jan 19, 2017 at 12:01 PM, Markus M. Berg <mm...@web.de> wrote:

> Hi,
>
> I just want to use the existing name finder with custom features. With the
> cmd line I can get the custom set of features running. Thanks for that.
> However, I want to be able to retrain the model dynamically, i.e. via
> source code.
>
> I am now using the XML file for defining the set of custom features
> instead of instantiating it via the AdaptiveFeatureGenerator. I then use
> the method openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool
> to convert it to a byte array which I can then pass to the
> TokenNameFinderFactory like this:
>
>     TokenNameFinderFactory factory = new TokenNameFinderFactory(
> openFeatureGeneratorBytes(featureGenFile),null, codec);
>
> a) Is this approach alright or would you recommend something else?
>
> b) Another question: Is it possible to somehow see the computed feature
> vector for every token (during training and prediction)?
>
> c) And out of curiosity: Is it possible to see how much a feature
> contributes to the final decision? I want to identify features that are
> useless and those which may lead to wrong predictions.
>
> Thank you very much for your help again!
>
> Best regards,
> Markus
>
>
> > Hello,
> >
> > it really depends on what are you trying to achieve.
> >
> > Maybe you know exactly what you want, in that case I would recommend to
> > sub-class the TokenNameFinderFactory, there could override the method to
> > create the feature generators. The default constructor is fine. The name
> > finder supports different encodings, currently Bio and Bilou. You would
> > need to pass a reference to one of those classes, or just use the default
> > (which is Bio).
> >
> > If you just want to have the name finder with custom feature generation I
> > would suggest to define an xml descriptor for it and just use our cmd
> line
> > interface to build the model. The cmd lie inerface has the advantage that
> > you can use all the tools without coding yourself, especially evaluation
> > and cross validation should be interesting for you.
> >
> > TokenNameFinderFactory(byte[] featureGeneratorBytes,
> > Map<String,Object> resources,
> > SequenceCodec<String> seqCodec)
> >
> > The byte[] is supposed to contain the feature generator xml bytes.
> >
> > HTH,
> > Jörn
>

Re: NameFinderME with custom FeatureGenerators

Posted by "Markus M. Berg" <mm...@web.de>.

Hi,

I just want to use the existing name finder with custom features. With the cmd line I can get the custom set of features running. Thanks for that. However, I want to be able to retrain the model dynamically, i.e. via source code.

I am now using the XML file for defining the set of custom features instead of instantiating it via the AdaptiveFeatureGenerator. I then use the method openFeatureGeneratorBytes() from the TokenNameFinderTrainerTool to convert it to a byte array which I can then pass to the TokenNameFinderFactory like this:

    TokenNameFinderFactory factory = new TokenNameFinderFactory(openFeatureGeneratorBytes(featureGenFile),null, codec);

a) Is this approach alright or would you recommend something else?

b) Another question: Is it possible to somehow see the computed feature vector for every token (during training and prediction)?

c) And out of curiosity: Is it possible to see how much a feature contributes to the final decision? I want to identify features that are useless and those which may lead to wrong predictions.

Thank you very much for your help again!

Best regards,
Markus

 
> Hello,
>
> it really depends on what are you trying to achieve.
>
> Maybe you know exactly what you want, in that case I would recommend to
> sub-class the TokenNameFinderFactory, there could override the method to
> create the feature generators. The default constructor is fine. The name
> finder supports different encodings, currently Bio and Bilou. You would
> need to pass a reference to one of those classes, or just use the default
> (which is Bio).
>
> If you just want to have the name finder with custom feature generation I
> would suggest to define an xml descriptor for it and just use our cmd line
> interface to build the model. The cmd lie inerface has the advantage that
> you can use all the tools without coding yourself, especially evaluation
> and cross validation should be interesting for you.
>
> TokenNameFinderFactory(byte[] featureGeneratorBytes,
> Map<String,Object> resources,
> SequenceCodec<String> seqCodec)
>
> The byte[] is supposed to contain the feature generator xml bytes.
>
> HTH,
> Jörn

Re: NameFinderME with custom FeatureGenerators

Posted by Joern Kottmann <ko...@gmail.com>.

Hello,

it really depends on what are you trying to achieve.

Maybe you know exactly what you want, in that case I would recommend to
sub-class the TokenNameFinderFactory, there could override the method to
create the feature generators. The default constructor is fine. The name
finder supports different encodings, currently Bio and Bilou. You would
need to pass a reference to one of those classes, or just use the default
(which is Bio).

If you just want to have the name finder with custom feature generation I
would suggest to define an xml descriptor for it and just use our cmd line
interface to build the model. The cmd lie inerface has the advantage that
you can use all the tools without coding yourself, especially evaluation
and cross validation should be interesting for you.

TokenNameFinderFactory(byte[] featureGeneratorBytes,
                              Map<String,Object> resources,
                              SequenceCodec<String> seqCodec)

The byte[] is supposed to contain the feature generator xml bytes.

HTH,
Jörn


On Wed, Jan 18, 2017 at 1:38 PM, Markus M. Berg <mm...@web.de> wrote:

> Dear all,
>
> I am trying to train the NameFinderME using a custom set of feature
> generators. However, I am not able to add the feature generators to the
> name finder.
>
> Here is what I do:
> As described in the documentation (https://opennlp.apache.org/
> documentation/1.7.0/manual/opennlp.html#tools.namefind.training.featuregen),
> I used the following code to set up the list of feature generators:
>
>    AdaptiveFeatureGenerator featureGenerator = new CachedFeatureGenerator(
>            new AdaptiveFeatureGenerator[]{
>            new WindowFeatureGenerator(new TokenFeatureGenerator(), 2, 2),
>            new WindowFeatureGenerator(new TokenClassFeatureGenerator(true),
> 2, 2),
>            new OutcomePriorFeatureGenerator(),
>            new PreviousMapFeatureGenerator(),
>            new BigramNameFeatureGenerator(),
>            new SentenceFeatureGenerator(true, false),
>            new BrownTokenFeatureGenerator(BrownCluster dictResource)
>            });
>
> Afterwards, in the documentation it is explained, that "the
> TokenNameFinderFactory allows to specify a custom feature generator".
> However, I don't know how to do this, since there is no add-Method or any
> parameter of type AdaptiveFeatureGenerator in the constructor.
>
>    TokenNameFinderFactory factory = new TokenNameFinderFactory()
>    ... //how to add the FeatureGenerator?
>    model = NameFinderME.train("en", "default", sampleStream,
> TrainingParameters.defaultParams(), factory);
>
> In an older release of OpenNlp, it was possible to add the
> featureGenerators via the train-Method like this:
>
>    train(String languageCode, String type, ObjectStream<NameSample>
> samples,
>        TrainingParameters trainParams, AdaptiveFeatureGenerator generator,
> final Map<String, Object> resources)
>
> But this not possible any longer. Can anybody describe the new way to
> implement this? An example would be great!
>
> I only found this:
>
>    public TokenNameFinderFactory(byte[] featureGeneratorBytes,
>                               Map<String,Object> resources,
>                               SequenceCodec<String> seqCodec)
>
> But I don’t know what parameters to pass (why a byte array?
> SequenceCodec?)...
>
> Any help is appreciated,
> Thanks in advance!
>
> Best,
> Markus
>