You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Nabarun <se...@gmail.com> on 2011/06/10 13:54:28 UTC

Stochastic gradient algorithm related queries

I was going through Mahout code. A couple of queries I had related to 
OnlineRegression algorithm (Stochastic Gradient implementation with LR)

1. I saw in the CrossFolder program the LogLikelihood was computed as 

    LogLikelihood += (Math.Log(score) - LogLikelihood)/(Math.min(records, 
windowSize)

My query is, can't we use the formula which says 
LogLikelihood = Sum (log p) or log(1-p) 
depending on the value of y 

where log p is computed online for each row

2. learning rate has been calculated as 

CurrentLearningRate = mu0 * Math.pow (decayFactor, getStep ()) * Math.pow 
(getStep () + stepOffset, forgettingExponent)

Can we use 
LearningRate (epoch) = initialLearningRate / (1 + epoch / annealingRate) 

Where we are going to use Inverse learning rate as it guarantees to converge 
to a limit 

Refference taken from: http://alias-i.com/lingpipe-
3.9.3/docs/api/com/aliasi/stats/AnnealingSchedule.html

Thanks
Nabarun


Re: Stochastic gradient algorithm related queries

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Jun 10, 2011 at 1:54 PM, Nabarun <se...@gmail.com> wrote:
> I was going through Mahout code. A couple of queries I had related to
> OnlineRegression algorithm (Stochastic Gradient implementation with LR)
>
> 1. I saw in the CrossFolder program the LogLikelihood was computed as
>
>    LogLikelihood += (Math.Log(score) - LogLikelihood)/(Math.min(records,
> windowSize)
>
> My query is, can't we use the formula which says
> LogLikelihood = Sum (log p) or log(1-p)
> depending on the value of y

How is that different from what is done?

The current code keeps only an exponential moving average.  See
http://tdunning.blogspot.com/2011_05_01_archive.html


>
> 2. learning rate has been calculated as
>
> CurrentLearningRate = mu0 * Math.pow (decayFactor, getStep ()) * Math.pow
> (getStep () + stepOffset, forgettingExponent)
>
> Can we use
> LearningRate (epoch) = initialLearningRate / (1 + epoch / annealingRate)

Again, that looks a log like a special case of what is here.

> Where we are going to use Inverse learning rate as it guarantees to converge
> to a limit

This is a theoretical guarantee that is not always helpful in practice.