You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Weihua Zhu <wz...@adconion.com> on 2011/07/11 23:08:48 UTC

combination of features worsen the performance

Hi, Dear all,

 I am using mahout logistic regression for classification; interestingly, for feature A, B, individually each has satisfactory performances, say 65%, 80%, but when i combine them together(using encoder), the performance is like 72%. Shouldn't the performance be better? Any thoughts? Thanks a lot,


-wz.

Re: combination of features worsen the performance

Posted by Weihua Zhu <wz...@adconion.com>.
Hi Ted,

 Thanks very much for your very detailed reply. It is very helpful. 
 still some questions. I hope i am not polluting this email list much..
 
I understand all your comments except below:
> Finally, you should be combining group ranking objective as well as
> regression objectives.  Otherwise, your model will simply be learning which
> users are likely to click on anything and those users who will never click
> on anything.  There are provisions for segmented AUC in the code, but that
> will only work for binary targets.  In general, it is common to build
> cascaded models to deal with this.  The first model learns to predict click
> and the cascaded model learns conversion conditional on click.

We can use binary targets; that shouldn't be a problem. 
Could you say a little more about "segmented AUC"? also about the cascaded models?
Do you have an reference papers/book/codesSamples/example projects for recommendation? 
I have the mahout in action book, but seems i didn't see stuff like that...
Thanks again for your help..


-Weihua


On Jul 11, 2011, at 3:30 PM, Ted Dunning wrote:

> There are lots of problems with the problem as posed.  I am not surprised
> with poor results.
> 
> You should not downsample negative examples so severely.  I would keep as
> many as 10-30 x as many positive examples you have.  Even then, I suspect
> you don't have enough data especially if you have already included data for
> all of your models.
> 
> Your Feature A is not useful unless you are putting all ad results together.
>  Even then, you need to include more advertiser, campaign and ad specific
> features.
> 
> The feature vector size of 10,000 is actually relatively small if you have
> any reasonable degree of sparsity in your user and ad features.  Unused
> features do not hurt learning.
> 
> Finally, you should be combining group ranking objective as well as
> regression objectives.  Otherwise, your model will simply be learning which
> users are likely to click on anything and those users who will never click
> on anything.  There are provisions for segmented AUC in the code, but that
> will only work for binary targets.  In general, it is common to build
> cascaded models to deal with this.  The first model learns to predict click
> and the cascaded model learns conversion conditional on click.
> 
> Most importantly, really, I would recommend that you experiment with model
> design using a system like R so that you can get fast turn-around on
> modeling efforts.
> 
> On Mon, Jul 11, 2011 at 3:04 PM, Weihua Zhu <wz...@adconion.com> wrote:
> 
>> hi Thanks Ted.
>> I understand that the training dataset size is small. The reason is that we
>> have very limited number of "action" class events/instances.  We also want
>> to make each target class have equal number of events/instances.
>> Feature A is the advertisement campaign ID, and Feature B is the behaviors
>> that internet user has, for example, gender:male, country: us, etc.
>> I set the size of the encoder to 10000, which is very large.
>> I used this setup for  OnlineLogisticRegressioN:
>>       olr = new OnlineLogisticRegression(3, FEATURES, new L1());
>>       olr.alpha(1).stepOffset(1000).lambda(3e-5).learningRate(3);
>> 
>> Thanks.
>> 
>> -wz
>> 
>> 
>> On Jul 11, 2011, at 2:49 PM, Ted Dunning wrote:
>> 
>>> This is a tiny amount of data.  The regularization in Mahout's SGD
>>> implementation is probably not as effective as second order techniques
>> for
>>> such tiny data.
>>> 
>>> Btw... you didn't answer my questions about what kind of data feature A
>> and
>>> B are.  I understand that you might be shy about this, but without that
>> kind
>>> of information, I can't help you.
>>> 
>>> (and add this additional question)
>>> 
>>> What is the size of the encoded vector?
>>> 
>>> On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:
>>> 
>>>> Target class is if a user click an ad(advertisement), buy through an ad,
>> or
>>>> not; so 3 classes.
>>>> Feature A s about the Advertisement itself;
>>>> Feature B is about the user's behaviors;
>>>> Currently im only using feature A and B.
>>>> Total training data is 250 for each class;
>>>> 
>>>> thanks..
>>>> 
>>>> 
>>>> ________________________________________
>>>> From: Ted Dunning [ted.dunning@gmail.com]
>>>> Sent: Monday, July 11, 2011 2:15 PM
>>>> To: user@mahout.apache.org
>>>> Subject: Re: combination of features worsen the performance
>>>> 
>>>> Can you say a little bit about the data?
>>>> 
>>>> What are features A and B?  What kind of data do they represent?
>>>> 
>>>> How many other features are there?
>>>> 
>>>> What is the target variable?  How many possible values does it have?
>>>> 
>>>> How much training data do you have?
>>>> 
>>>> What sort of training are you doing?
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
>>>> 
>>>>> Hi, Dear all,
>>>>> 
>>>>> I am using mahout logistic regression for classification;
>> interestingly,
>>>>> for feature A, B, individually each has satisfactory performances, say
>>>> 65%,
>>>>> 80%, but when i combine them together(using encoder), the performance
>> is
>>>>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a
>>>> lot,
>>>>> 
>>>>> 
>>>>> -wz.
>>>>> 
>>>> 
>> 
>> 


Re: combination of features worsen the performance

Posted by Ted Dunning <te...@gmail.com>.
There are lots of problems with the problem as posed.  I am not surprised
with poor results.

You should not downsample negative examples so severely.  I would keep as
many as 10-30 x as many positive examples you have.  Even then, I suspect
you don't have enough data especially if you have already included data for
all of your models.

Your Feature A is not useful unless you are putting all ad results together.
  Even then, you need to include more advertiser, campaign and ad specific
features.

The feature vector size of 10,000 is actually relatively small if you have
any reasonable degree of sparsity in your user and ad features.  Unused
features do not hurt learning.

Finally, you should be combining group ranking objective as well as
regression objectives.  Otherwise, your model will simply be learning which
users are likely to click on anything and those users who will never click
on anything.  There are provisions for segmented AUC in the code, but that
will only work for binary targets.  In general, it is common to build
cascaded models to deal with this.  The first model learns to predict click
and the cascaded model learns conversion conditional on click.

Most importantly, really, I would recommend that you experiment with model
design using a system like R so that you can get fast turn-around on
modeling efforts.

On Mon, Jul 11, 2011 at 3:04 PM, Weihua Zhu <wz...@adconion.com> wrote:

> hi Thanks Ted.
> I understand that the training dataset size is small. The reason is that we
> have very limited number of "action" class events/instances.  We also want
> to make each target class have equal number of events/instances.
> Feature A is the advertisement campaign ID, and Feature B is the behaviors
> that internet user has, for example, gender:male, country: us, etc.
> I set the size of the encoder to 10000, which is very large.
> I used this setup for  OnlineLogisticRegressioN:
>        olr = new OnlineLogisticRegression(3, FEATURES, new L1());
>        olr.alpha(1).stepOffset(1000).lambda(3e-5).learningRate(3);
>
> Thanks.
>
> -wz
>
>
> On Jul 11, 2011, at 2:49 PM, Ted Dunning wrote:
>
> > This is a tiny amount of data.  The regularization in Mahout's SGD
> > implementation is probably not as effective as second order techniques
> for
> > such tiny data.
> >
> > Btw... you didn't answer my questions about what kind of data feature A
> and
> > B are.  I understand that you might be shy about this, but without that
> kind
> > of information, I can't help you.
> >
> > (and add this additional question)
> >
> > What is the size of the encoded vector?
> >
> > On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:
> >
> >> Target class is if a user click an ad(advertisement), buy through an ad,
> or
> >> not; so 3 classes.
> >> Feature A s about the Advertisement itself;
> >> Feature B is about the user's behaviors;
> >> Currently im only using feature A and B.
> >> Total training data is 250 for each class;
> >>
> >> thanks..
> >>
> >>
> >> ________________________________________
> >> From: Ted Dunning [ted.dunning@gmail.com]
> >> Sent: Monday, July 11, 2011 2:15 PM
> >> To: user@mahout.apache.org
> >> Subject: Re: combination of features worsen the performance
> >>
> >> Can you say a little bit about the data?
> >>
> >> What are features A and B?  What kind of data do they represent?
> >>
> >> How many other features are there?
> >>
> >> What is the target variable?  How many possible values does it have?
> >>
> >> How much training data do you have?
> >>
> >> What sort of training are you doing?
> >>
> >>
> >>
> >> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
> >>
> >>> Hi, Dear all,
> >>>
> >>> I am using mahout logistic regression for classification;
> interestingly,
> >>> for feature A, B, individually each has satisfactory performances, say
> >> 65%,
> >>> 80%, but when i combine them together(using encoder), the performance
> is
> >>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a
> >> lot,
> >>>
> >>>
> >>> -wz.
> >>>
> >>
>
>

Re: combination of features worsen the performance

Posted by Weihua Zhu <wz...@adconion.com>.
hi Thanks Ted. 
I understand that the training dataset size is small. The reason is that we have very limited number of "action" class events/instances.  We also want to make each target class have equal number of events/instances.   
Feature A is the advertisement campaign ID, and Feature B is the behaviors that internet user has, for example, gender:male, country: us, etc.
I set the size of the encoder to 10000, which is very large.
I used this setup for  OnlineLogisticRegressioN:
        olr = new OnlineLogisticRegression(3, FEATURES, new L1());
        olr.alpha(1).stepOffset(1000).lambda(3e-5).learningRate(3);
 
Thanks.

-wz


On Jul 11, 2011, at 2:49 PM, Ted Dunning wrote:

> This is a tiny amount of data.  The regularization in Mahout's SGD
> implementation is probably not as effective as second order techniques for
> such tiny data.
> 
> Btw... you didn't answer my questions about what kind of data feature A and
> B are.  I understand that you might be shy about this, but without that kind
> of information, I can't help you.
> 
> (and add this additional question)
> 
> What is the size of the encoded vector?
> 
> On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:
> 
>> Target class is if a user click an ad(advertisement), buy through an ad, or
>> not; so 3 classes.
>> Feature A s about the Advertisement itself;
>> Feature B is about the user's behaviors;
>> Currently im only using feature A and B.
>> Total training data is 250 for each class;
>> 
>> thanks..
>> 
>> 
>> ________________________________________
>> From: Ted Dunning [ted.dunning@gmail.com]
>> Sent: Monday, July 11, 2011 2:15 PM
>> To: user@mahout.apache.org
>> Subject: Re: combination of features worsen the performance
>> 
>> Can you say a little bit about the data?
>> 
>> What are features A and B?  What kind of data do they represent?
>> 
>> How many other features are there?
>> 
>> What is the target variable?  How many possible values does it have?
>> 
>> How much training data do you have?
>> 
>> What sort of training are you doing?
>> 
>> 
>> 
>> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
>> 
>>> Hi, Dear all,
>>> 
>>> I am using mahout logistic regression for classification; interestingly,
>>> for feature A, B, individually each has satisfactory performances, say
>> 65%,
>>> 80%, but when i combine them together(using encoder), the performance is
>>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a
>> lot,
>>> 
>>> 
>>> -wz.
>>> 
>> 


Re: combination of features worsen the performance

Posted by Ted Dunning <te...@gmail.com>.
This is a tiny amount of data.  The regularization in Mahout's SGD
implementation is probably not as effective as second order techniques for
such tiny data.

Btw... you didn't answer my questions about what kind of data feature A and
B are.  I understand that you might be shy about this, but without that kind
of information, I can't help you.

(and add this additional question)

What is the size of the encoded vector?

On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:

> Target class is if a user click an ad(advertisement), buy through an ad, or
> not; so 3 classes.
> Feature A s about the Advertisement itself;
> Feature B is about the user's behaviors;
> Currently im only using feature A and B.
> Total training data is 250 for each class;
>
> thanks..
>
>
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Monday, July 11, 2011 2:15 PM
> To: user@mahout.apache.org
> Subject: Re: combination of features worsen the performance
>
> Can you say a little bit about the data?
>
> What are features A and B?  What kind of data do they represent?
>
> How many other features are there?
>
> What is the target variable?  How many possible values does it have?
>
> How much training data do you have?
>
> What sort of training are you doing?
>
>
>
> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
>
> > Hi, Dear all,
> >
> >  I am using mahout logistic regression for classification; interestingly,
> > for feature A, B, individually each has satisfactory performances, say
> 65%,
> > 80%, but when i combine them together(using encoder), the performance is
> > like 72%. Shouldn't the performance be better? Any thoughts? Thanks a
> lot,
> >
> >
> > -wz.
> >
>

Re: combination of features worsen the performance

Posted by Weihua Zhu <wz...@adconion.com>.
thanks. We are trying to get larger dataset. probably over 2000 for each class.
what do you mean by "the errors on performance estimates"? the confusion matrix?


On Jul 11, 2011, at 2:44 PM, Konstantin Shmakov wrote:

> It seems  that training data set is way too small. What are the errors
> on performance estimates?
> 
> --
> 
> On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:
>> Target class is if a user click an ad(advertisement), buy through an ad, or not; so 3 classes.
>> Feature A s about the Advertisement itself;
>> Feature B is about the user's behaviors;
>> Currently im only using feature A and B.
>> Total training data is 250 for each class;
>> 
>> thanks..
>> 
>> 
>> ________________________________________
>> From: Ted Dunning [ted.dunning@gmail.com]
>> Sent: Monday, July 11, 2011 2:15 PM
>> To: user@mahout.apache.org
>> Subject: Re: combination of features worsen the performance
>> 
>> Can you say a little bit about the data?
>> 
>> What are features A and B?  What kind of data do they represent?
>> 
>> How many other features are there?
>> 
>> What is the target variable?  How many possible values does it have?
>> 
>> How much training data do you have?
>> 
>> What sort of training are you doing?
>> 
>> 
>> 
>> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
>> 
>>> Hi, Dear all,
>>> 
>>>  I am using mahout logistic regression for classification; interestingly,
>>> for feature A, B, individually each has satisfactory performances, say 65%,
>>> 80%, but when i combine them together(using encoder), the performance is
>>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a lot,
>>> 
>>> 
>>> -wz.
>>> 
>> 
> 
> 
> 
> -- 
> ksh:


Re: combination of features worsen the performance

Posted by Konstantin Shmakov <ks...@gmail.com>.
It seems  that training data set is way too small. What are the errors
on performance estimates?

--

On Mon, Jul 11, 2011 at 2:26 PM, Weihua Zhu <wz...@adconion.com> wrote:
> Target class is if a user click an ad(advertisement), buy through an ad, or not; so 3 classes.
> Feature A s about the Advertisement itself;
> Feature B is about the user's behaviors;
> Currently im only using feature A and B.
> Total training data is 250 for each class;
>
> thanks..
>
>
> ________________________________________
> From: Ted Dunning [ted.dunning@gmail.com]
> Sent: Monday, July 11, 2011 2:15 PM
> To: user@mahout.apache.org
> Subject: Re: combination of features worsen the performance
>
> Can you say a little bit about the data?
>
> What are features A and B?  What kind of data do they represent?
>
> How many other features are there?
>
> What is the target variable?  How many possible values does it have?
>
> How much training data do you have?
>
> What sort of training are you doing?
>
>
>
> On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:
>
>> Hi, Dear all,
>>
>>  I am using mahout logistic regression for classification; interestingly,
>> for feature A, B, individually each has satisfactory performances, say 65%,
>> 80%, but when i combine them together(using encoder), the performance is
>> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a lot,
>>
>>
>> -wz.
>>
>



-- 
ksh:

RE: combination of features worsen the performance

Posted by Weihua Zhu <wz...@adconion.com>.
Target class is if a user click an ad(advertisement), buy through an ad, or not; so 3 classes. 
Feature A s about the Advertisement itself;
Feature B is about the user's behaviors;
Currently im only using feature A and B. 
Total training data is 250 for each class;

thanks..


________________________________________
From: Ted Dunning [ted.dunning@gmail.com]
Sent: Monday, July 11, 2011 2:15 PM
To: user@mahout.apache.org
Subject: Re: combination of features worsen the performance

Can you say a little bit about the data?

What are features A and B?  What kind of data do they represent?

How many other features are there?

What is the target variable?  How many possible values does it have?

How much training data do you have?

What sort of training are you doing?



On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:

> Hi, Dear all,
>
>  I am using mahout logistic regression for classification; interestingly,
> for feature A, B, individually each has satisfactory performances, say 65%,
> 80%, but when i combine them together(using encoder), the performance is
> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a lot,
>
>
> -wz.
>

Re: combination of features worsen the performance

Posted by Ted Dunning <te...@gmail.com>.
Can you say a little bit about the data?

What are features A and B?  What kind of data do they represent?

How many other features are there?

What is the target variable?  How many possible values does it have?

How much training data do you have?

What sort of training are you doing?



On Mon, Jul 11, 2011 at 2:08 PM, Weihua Zhu <wz...@adconion.com> wrote:

> Hi, Dear all,
>
>  I am using mahout logistic regression for classification; interestingly,
> for feature A, B, individually each has satisfactory performances, say 65%,
> 80%, but when i combine them together(using encoder), the performance is
> like 72%. Shouldn't the performance be better? Any thoughts? Thanks a lot,
>
>
> -wz.
>