You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Stanley Xu <we...@gmail.com> on 2011/04/19 10:33:16 UTC

How could I set a loss function in SGD?

Dear All,

I am trying to use the SGD in Mahout to do an experiment for CTR prediction.
I am wondering how I could set a loss function for the algorithm or what
default loss function the SGD is using? I haven't had a chance to read the
paper and code in detail but just go through it quickly. It looks the SGD in
Mahout just try to maximize the log likehood of the model.

What should I do if I wanted to add a penalty for a very possible click
would be thought as non-click?

Thanks.

Best wishes,
Stanley Xu

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Apr 21, 2011 at 7:09 PM, Stanley Xu <we...@gmail.com> wrote:

>
> The add weight you mean here is re-define a train method, add a weight
> parameter and adjust the learning rate of currentLearningRate() with this
> param. Not the weight parameter already exist in the features. Am I correct?
>

Yes.

But you raise a good point.  If you want to adjust the weight during
encoding that would have the same effect.  Make sure that you adjust all of
the encoding weights together.

Re: How could I set a loss function in SGD?

Posted by Stanley Xu <we...@gmail.com>.

Hi Ted,

I thought I got it but wanted to confirm once again for I am not a native
English speaker.

The add weight you mean here is re-define a train method, add a weight
parameter and adjust the learning rate of currentLearningRate() with this
param. Not the weight parameter already exist in the features. Am I correct?

Thanks for your patience to a Machine Learning newbie like me.

Best withes,
Stanley Xu


On Fri, Apr 22, 2011 at 6:14 AM, Ted Dunning <te...@gmail.com> wrote:

>
>
> On Tue, Apr 19, 2011 at 11:02 PM, Stanley Xu <we...@gmail.com> wrote:
>
>> What make me still a little confused is that, when training the model, I
>> probably knew the errors, could we thought that the penalty I wanted was
>> already counted in a loss function?
>>
>
>  It could be, but usually isn't.
>
>
>> And for weight the item differently, did you mean I should adjust the
>> number of positive and negative examples in the training dataset? Like doing
>> a down sampling?
>>
>
> Repeating the samples is not good because it appears to be more data than
> it really is.
>
> Down-sampling positives and negatives differently won't work either because
> you are just adjusting the offset term in the logistic regression.  It is
> reasonable to downsample
> the most common target in order to speed up learning and to avoid
> regularizing away positive features, but it won't really change the results
> in terms of AUC.  It will shift the threshold
> required for any desired level of false positive, but you could have
> shifted the threshold without downsampling to get the same effect.
>
> Changing the weights should be done by passing a weight into the training
> method and using that as an additional factor on the learning rate.
>
>
>
>

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

On Tue, Apr 19, 2011 at 11:02 PM, Stanley Xu <we...@gmail.com> wrote:

> What make me still a little confused is that, when training the model, I
> probably knew the errors, could we thought that the penalty I wanted was
> already counted in a loss function?
>

It could be, but usually isn't.

> And for weight the item differently, did you mean I should adjust the
> number of positive and negative examples in the training dataset? Like doing
> a down sampling?
>

Repeating the samples is not good because it appears to be more data than it
really is.

Down-sampling positives and negatives differently won't work either because
you are just adjusting the offset term in the logistic regression.  It is
reasonable to downsample
the most common target in order to speed up learning and to avoid
regularizing away positive features, but it won't really change the results
in terms of AUC.  It will shift the threshold
required for any desired level of false positive, but you could have shifted
the threshold without downsampling to get the same effect.

Changing the weights should be done by passing a weight into the training
method and using that as an additional factor on the learning rate.

Re: How could I set a loss function in SGD?

Posted by Stanley Xu <we...@gmail.com>.

Hi Ted,

Thanks for you kindness to answer all these questions, especially figured
out the difference between an error cost model and a loss function.

What make me still a little confused is that, when training the model, I
probably knew the errors, could we thought that the penalty I wanted was
already counted in a loss function?

And for weight the item differently, did you mean I should adjust the number
of positive and negative examples in the training dataset? Like doing a down
sampling?

Thanks.

Best wishes,
Stanley Xu



On Wed, Apr 20, 2011 at 12:46 PM, Ted Dunning <te...@gmail.com> wrote:

> For this sort of thing, I think that you can simply weight the items
> differently.  You don't need a different loss function.
>
> The difference here is between an error cost model (what you want) and a
> loss function (which is internal to the learning algorithm).
>
> But since you don't really get to see all kinds of errors, I would think
> that this is quite dangerous.  The problem is that you don't get to know
> whether impressions that you didn't show would convert.  Since you can't
> weight that error at all, you might as well not worry about the weight
> applied to the other kind of error (you showed an ad and didn't get a
> click).
>
>
> On Tue, Apr 19, 2011 at 8:55 PM, Stanley Xu <we...@gmail.com> wrote:
>
>> Thanks for your reply, Ted. Basically, it would be the regularization you
>> mentioned before. But I have a incomplete idea that we might have different
>> loss function for if we missed a value click, it is a big loss to an
>> AdNetwork, but if we just get a invaluable click, it is still acceptable to
>> a AdNetwork. So the error/loss function might be adjusted to reflect the
>> situation.
>>
>> Best wishes,
>> Stanley Xu
>>
>>
>>
>>
>> On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> I don't understand this.
>>>
>>> Can you rephrase it or describe it more fully?
>>>
>>> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>>>
>>>> What should I do if I wanted to add a penalty for a very possible click
>>>> would be thought as non-click?
>>>>
>>>
>>>
>>
>

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

For this sort of thing, I think that you can simply weight the items
differently.  You don't need a different loss function.

The difference here is between an error cost model (what you want) and a
loss function (which is internal to the learning algorithm).

But since you don't really get to see all kinds of errors, I would think
that this is quite dangerous.  The problem is that you don't get to know
whether impressions that you didn't show would convert.  Since you can't
weight that error at all, you might as well not worry about the weight
applied to the other kind of error (you showed an ad and didn't get a
click).

On Tue, Apr 19, 2011 at 8:55 PM, Stanley Xu <we...@gmail.com> wrote:

> Thanks for your reply, Ted. Basically, it would be the regularization you
> mentioned before. But I have a incomplete idea that we might have different
> loss function for if we missed a value click, it is a big loss to an
> AdNetwork, but if we just get a invaluable click, it is still acceptable to
> a AdNetwork. So the error/loss function might be adjusted to reflect the
> situation.
>
> Best wishes,
> Stanley Xu
>
>
>
>
> On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com>wrote:
>
>> I don't understand this.
>>
>> Can you rephrase it or describe it more fully?
>>
>> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>>
>>> What should I do if I wanted to add a penalty for a very possible click
>>> would be thought as non-click?
>>>
>>
>>
>

Re: How could I set a loss function in SGD?

Posted by Stanley Xu <we...@gmail.com>.

Thanks for your reply, Ted. Basically, it would be the regularization you
mentioned before. But I have a incomplete idea that we might have different
loss function for if we missed a value click, it is a big loss to an
AdNetwork, but if we just get a invaluable click, it is still acceptable to
a AdNetwork. So the error/loss function might be adjusted to reflect the
situation.

Best wishes,
Stanley Xu

On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com> wrote:

> I don't understand this.
>
> Can you rephrase it or describe it more fully?
>
> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>
>> What should I do if I wanted to add a penalty for a very possible click
>> would be thought as non-click?
>>
>
>

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

I don't understand this.

Can you rephrase it or describe it more fully?

On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:

> What should I do if I wanted to add a penalty for a very possible click
> would be thought as non-click?
>

Re: How could I set a loss function in SGD?

Posted by Stanley Xu <we...@gmail.com>.

Got you, Ted. Thanks a lot.

Best wishes,
Stanley Xu



On Wed, Apr 20, 2011 at 12:49 PM, Ted Dunning <te...@gmail.com> wrote:

> Great.
>
> To use grouped AUC, you simply need to pass in a group key into the
> training method.  I think that the current implementation is a little bit
> limited and possibly not entirely finished.  Read it carefully before using.
>  The major limitation that I remember is that memory usage scales with
> number of groups used.  If you can cluster users, that might help because
> you could substitute user cluster id for user.
>
> I don't know of any papers that cover this, but I have seen references in
> several papers so it is clearly a standard technique.
>
>
> On Tue, Apr 19, 2011 at 9:11 PM, Stanley Xu <we...@gmail.com> wrote:
>
>> Thanks Ted.
>>
>> For the MixedGradient you suggested, I found the MixedGradient in the
>> codebase.
>> For the per user AUC, is there any papers you suggested to read? Or there
>> already in implementation in the Mahout?
>>
>> Thanks.
>> Stanley Xu
>>
>>
>>
>> On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com>wrote:
>>
>>> The loss that is being optimized is, indeed, log-loss regularized by your
>>> choice of prior.
>>>
>>> Make sure that you are using AdaptiveLogisticRegression for CTR.  You
>>> almost certainly will also need to use per user AUC for learning the
>>> hyper-parameters.  Otherwise what happens is that you will just learn a
>>> model that finds users that click rather than user x opportunity
>>> combinations that cause clicks.
>>>
>>> There have been a number of experiments in changing the optimization of
>>> the SGD sequential logistic regression.  These include:
>>>
>>> a) mixed ranking and regression as the primitive error function
>>>
>>> b) per user AUC instead of standard auc for optimizing the learning
>>> parameters
>>>
>>> For changing the actual loss function in the OnlineLogisticRegression,
>>> you have to change how the gradient field in AbstractLogisticRegression is
>>> set.  Current that uses DefaultGradient, but it is easy to change this.
>>>
>>> The reason that this isn't easy to do yet is that there hasn't been much
>>> call for alternatives.
>>>
>>>
>>> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>>>
>>>> Dear All,
>>>>
>>>> I am trying to use the SGD in Mahout to do an experiment for CTR
>>>> prediction.
>>>> I am wondering how I could set a loss function for the algorithm or what
>>>> default loss function the SGD is using? I haven't had a chance to read
>>>> the
>>>> paper and code in detail but just go through it quickly. It looks the
>>>> SGD in
>>>> Mahout just try to maximize the log likehood of the model.
>>>>
>>>> What should I do if I wanted to add a penalty for a very possible click
>>>> would be thought as non-click?
>>>>
>>>> Thanks.
>>>>
>>>> Best wishes,
>>>> Stanley Xu
>>>>
>>>
>>>
>>
>

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

Great.

To use grouped AUC, you simply need to pass in a group key into the training
method.  I think that the current implementation is a little bit limited and
possibly not entirely finished.  Read it carefully before using.  The major
limitation that I remember is that memory usage scales with number of groups
used.  If you can cluster users, that might help because you could
substitute user cluster id for user.

I don't know of any papers that cover this, but I have seen references in
several papers so it is clearly a standard technique.

On Tue, Apr 19, 2011 at 9:11 PM, Stanley Xu <we...@gmail.com> wrote:

> Thanks Ted.
>
> For the MixedGradient you suggested, I found the MixedGradient in the
> codebase.
> For the per user AUC, is there any papers you suggested to read? Or there
> already in implementation in the Mahout?
>
> Thanks.
> Stanley Xu
>
>
>
> On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com>wrote:
>
>> The loss that is being optimized is, indeed, log-loss regularized by your
>> choice of prior.
>>
>> Make sure that you are using AdaptiveLogisticRegression for CTR.  You
>> almost certainly will also need to use per user AUC for learning the
>> hyper-parameters.  Otherwise what happens is that you will just learn a
>> model that finds users that click rather than user x opportunity
>> combinations that cause clicks.
>>
>> There have been a number of experiments in changing the optimization of
>> the SGD sequential logistic regression.  These include:
>>
>> a) mixed ranking and regression as the primitive error function
>>
>> b) per user AUC instead of standard auc for optimizing the learning
>> parameters
>>
>> For changing the actual loss function in the OnlineLogisticRegression, you
>> have to change how the gradient field in AbstractLogisticRegression is set.
>>  Current that uses DefaultGradient, but it is easy to change this.
>>
>> The reason that this isn't easy to do yet is that there hasn't been much
>> call for alternatives.
>>
>>
>> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>>
>>> Dear All,
>>>
>>> I am trying to use the SGD in Mahout to do an experiment for CTR
>>> prediction.
>>> I am wondering how I could set a loss function for the algorithm or what
>>> default loss function the SGD is using? I haven't had a chance to read
>>> the
>>> paper and code in detail but just go through it quickly. It looks the SGD
>>> in
>>> Mahout just try to maximize the log likehood of the model.
>>>
>>> What should I do if I wanted to add a penalty for a very possible click
>>> would be thought as non-click?
>>>
>>> Thanks.
>>>
>>> Best wishes,
>>> Stanley Xu
>>>
>>
>>
>

Re: How could I set a loss function in SGD?

Posted by Stanley Xu <we...@gmail.com>.

Thanks Ted.

For the MixedGradient you suggested, I found the MixedGradient in the
codebase.
For the per user AUC, is there any papers you suggested to read? Or there
already in implementation in the Mahout?

Thanks.
Stanley Xu



On Wed, Apr 20, 2011 at 12:49 AM, Ted Dunning <te...@gmail.com> wrote:

> The loss that is being optimized is, indeed, log-loss regularized by your
> choice of prior.
>
> Make sure that you are using AdaptiveLogisticRegression for CTR.  You
> almost certainly will also need to use per user AUC for learning the
> hyper-parameters.  Otherwise what happens is that you will just learn a
> model that finds users that click rather than user x opportunity
> combinations that cause clicks.
>
> There have been a number of experiments in changing the optimization of the
> SGD sequential logistic regression.  These include:
>
> a) mixed ranking and regression as the primitive error function
>
> b) per user AUC instead of standard auc for optimizing the learning
> parameters
>
> For changing the actual loss function in the OnlineLogisticRegression, you
> have to change how the gradient field in AbstractLogisticRegression is set.
>  Current that uses DefaultGradient, but it is easy to change this.
>
> The reason that this isn't easy to do yet is that there hasn't been much
> call for alternatives.
>
>
> On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:
>
>> Dear All,
>>
>> I am trying to use the SGD in Mahout to do an experiment for CTR
>> prediction.
>> I am wondering how I could set a loss function for the algorithm or what
>> default loss function the SGD is using? I haven't had a chance to read the
>> paper and code in detail but just go through it quickly. It looks the SGD
>> in
>> Mahout just try to maximize the log likehood of the model.
>>
>> What should I do if I wanted to add a penalty for a very possible click
>> would be thought as non-click?
>>
>> Thanks.
>>
>> Best wishes,
>> Stanley Xu
>>
>
>

Re: How could I set a loss function in SGD?

Posted by Ted Dunning <te...@gmail.com>.

The loss that is being optimized is, indeed, log-loss regularized by your
choice of prior.

Make sure that you are using AdaptiveLogisticRegression for CTR.  You almost
certainly will also need to use per user AUC for learning the
hyper-parameters.  Otherwise what happens is that you will just learn a
model that finds users that click rather than user x opportunity
combinations that cause clicks.

There have been a number of experiments in changing the optimization of the
SGD sequential logistic regression.  These include:

a) mixed ranking and regression as the primitive error function

b) per user AUC instead of standard auc for optimizing the learning
parameters

For changing the actual loss function in the OnlineLogisticRegression, you
have to change how the gradient field in AbstractLogisticRegression is set.
 Current that uses DefaultGradient, but it is easy to change this.

The reason that this isn't easy to do yet is that there hasn't been much
call for alternatives.

On Tue, Apr 19, 2011 at 1:33 AM, Stanley Xu <we...@gmail.com> wrote:

> Dear All,
>
> I am trying to use the SGD in Mahout to do an experiment for CTR
> prediction.
> I am wondering how I could set a loss function for the algorithm or what
> default loss function the SGD is using? I haven't had a chance to read the
> paper and code in detail but just go through it quickly. It looks the SGD
> in
> Mahout just try to maximize the log likehood of the model.
>
> What should I do if I wanted to add a penalty for a very possible click
> would be thought as non-click?
>
> Thanks.
>
> Best wishes,
> Stanley Xu
>