You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by "Manoj B. Narayanan" <ma...@gmail.com> on 2017/12/19 13:27:14 UTC

Feature Manipulation for Maxent model in NER

Hi all,

I tried varying the custom features we provide to the model. I have a few
queries regarding it.

1. Will the probability for a particular feature get affected if I add it
multiple times?
     Eg. If I add a feature 'pos=NN' multiple times, will it have an impact
on model performance?

2. What if I add the same feature differently?
     Eg. I add 'pos=NN' and 'partsOfSpeech=NN', what will be the impact.
These 2 are always co-occurring too. So how will the model treat them.

3. How does the model learn the features? Please give a small example.

4. What if we can add classes to the features?
     Eg. Certain features can have only a certain set of values. If we are
able to label them, can we make the model learn features according to the
labels?
     Say, I have a) pos feature b) dictionary feature
     If the probability is calculated with respect to the corresponding
class (pos / dictionary) and then the overall probability is calculated how
will the model behave?
    Instead of giving a single string as a feature what if we give a key,
value pair as feature?

Awaiting discussion regarding these.

Thanks,
Manoj.

Re: Feature Manipulation for Maxent model in NER

Posted by "Manoj B. Narayanan" <ma...@gmail.com>.
Thanks Dan.

On Tue, Dec 19, 2017 at 8:10 PM, Dan Russ <da...@gmail.com> wrote:

>
> > On Dec 19, 2017, at 8:27 AM, Manoj B. Narayanan <
> manojb.narayanan2011@gmail.com> wrote:
> >
> > Hi all,
> >
> > I tried varying the custom features we provide to the model. I have a few
> > queries regarding it.
> >
> > 1. Will the probability for a particular feature get affected if I add it
> > multiple times?
> >     Eg. If I add a feature 'pos=NN' multiple times, will it have an
> impact
> > on model performance?
>
> Yes, but not necessarily the way you expect.  Remember that maxent tries
> to maximize the probability of being correct given the training data.
> Naive Bayes models are much easier to see the effect of multiple pos=NN
> features than in maxent
>
> >
> > 2. What if I add the same feature differently?
> >     Eg. I add 'pos=NN' and 'partsOfSpeech=NN', what will be the impact.
> > These 2 are always co-occurring too. So how will the model treat them.
> >
>
> Maxent is not really effected by correlated terms.  It will split the
> weights among the terms.  Your two terms (pos=NN and partOfSpeech=NN) are
> completely terms.
>
> > 3. How does the model learn the features? Please give a small example.
>
> There are 2 training methods in OpenNLP.  GIS and L-BFGS.  Essentially you
> are trying to maximize P( correct outcome | features),   the outcome is exp
> (sum( WiXi) )/ sumj ( exp(W,Xj).   Both GIS and L-BFGS are iterative
> methods to get the weights that get the most training cases correct.
>
> >
> > 4. What if we can add classes to the features?
> >     Eg. Certain features can have only a certain set of values.
>
> This is actually an incorrect statement.  A feature is a function that
> indicates the PRESENCE of something WITH A PARTICULAR OUTCOME.  What does
> that mean?  If your “document” is: “The cat” and you have two outcomes
> (“animal” and “mineral”). Than there are 2 features associated with the
> word “cat”, F(“CAT”,animal) and F(“CAT”,mineral).   These features each
> have different weights.
>
> > If we are
> > able to label them, can we make the model learn features according to the
> > labels?
>
> Given what I said about features earlier, yes you can (you actually alway
> do this).  In openNLP, if you add a term (actually a predicate, but I won’t
> call it that) to an event context, you are associating a term with an
> outcome (a feature) -- (But understand that all potential features exist,
> they just have a weight of 0.).
>
> >     Say, I have a) pos feature b) dictionary feature
> >     If the probability is calculated with respect to the corresponding
> > class (pos / dictionary) and then the overall probability is calculated
> how
> > will the model behave?
>
> Not sure I understand your question.
>
> >    Instead of giving a single string as a feature what if we give a key,
> > value pair as feature?
>
> If you give a=b that is a single feature, a=c is another feature that has
> nothing to do with a=b.
>
> >
> > Awaiting discussion regarding these.
> >
> > Thanks,
> > Manoj.
>
>


-- 
Regards,
Manoj.

Re: Feature Manipulation for Maxent model in NER

Posted by Dan Russ <da...@gmail.com>.
> On Dec 19, 2017, at 8:27 AM, Manoj B. Narayanan <ma...@gmail.com> wrote:
> 
> Hi all,
> 
> I tried varying the custom features we provide to the model. I have a few
> queries regarding it.
> 
> 1. Will the probability for a particular feature get affected if I add it
> multiple times?
>     Eg. If I add a feature 'pos=NN' multiple times, will it have an impact
> on model performance?

Yes, but not necessarily the way you expect.  Remember that maxent tries to maximize the probability of being correct given the training data.  Naive Bayes models are much easier to see the effect of multiple pos=NN features than in maxent 

> 
> 2. What if I add the same feature differently?
>     Eg. I add 'pos=NN' and 'partsOfSpeech=NN', what will be the impact.
> These 2 are always co-occurring too. So how will the model treat them.
> 

Maxent is not really effected by correlated terms.  It will split the weights among the terms.  Your two terms (pos=NN and partOfSpeech=NN) are completely terms.

> 3. How does the model learn the features? Please give a small example.

There are 2 training methods in OpenNLP.  GIS and L-BFGS.  Essentially you are trying to maximize P( correct outcome | features),   the outcome is exp (sum( WiXi) )/ sumj ( exp(W,Xj).   Both GIS and L-BFGS are iterative methods to get the weights that get the most training cases correct.  

> 
> 4. What if we can add classes to the features?
>     Eg. Certain features can have only a certain set of values.

This is actually an incorrect statement.  A feature is a function that indicates the PRESENCE of something WITH A PARTICULAR OUTCOME.  What does that mean?  If your “document” is: “The cat” and you have two outcomes (“animal” and “mineral”). Than there are 2 features associated with the word “cat”, F(“CAT”,animal) and F(“CAT”,mineral).   These features each have different weights.

> If we are
> able to label them, can we make the model learn features according to the
> labels?

Given what I said about features earlier, yes you can (you actually alway do this).  In openNLP, if you add a term (actually a predicate, but I won’t call it that) to an event context, you are associating a term with an outcome (a feature) -- (But understand that all potential features exist, they just have a weight of 0.). 

>     Say, I have a) pos feature b) dictionary feature
>     If the probability is calculated with respect to the corresponding
> class (pos / dictionary) and then the overall probability is calculated how
> will the model behave?

Not sure I understand your question.  

>    Instead of giving a single string as a feature what if we give a key,
> value pair as feature?

If you give a=b that is a single feature, a=c is another feature that has nothing to do with a=b.

> 
> Awaiting discussion regarding these.
> 
> Thanks,
> Manoj.