You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Frank Wang <wa...@gmail.com> on 2010/11/10 10:26:33 UTC

Re: Implementation for Linear Regression

With linear regression, it seems that the coefficients tend to grow
unboundedly when using a larger learning rate, ie. --rate 50.
Works fine when i keep the rate < 1.

Is this a normal characteristics for Linear Regression?

On Fri, Oct 22, 2010 at 1:08 AM, Frank Wang <wa...@gmail.com> wrote:

> Thanks Ted.
>
> It's a very interesting solution. Currently, we need to account for age
> related terms when calculating the relevance ranking, and this is done
> before display time. We will play around with our data and see if we can
> model our data to leverage on the trick.
>
> In terms of Linear Regression, I've attached the initial patch on
> MAHOUT-529 <https://issues.apache.org/jira/browse/MAHOUT-529>. It's mainly
> the AbstractOnlineLinearRegression and OnlineLinearRegression classes. Lemme
> know if the code makes sense.
>
> I have 2 questions:
>
> 1.
> The apply() function in DefaultGradient has:
>     Vector r = v.like();
>     if (actual != 0) {
>       r.setQuick(actual - 1, 1);
>     }
>
> The code seems to work only for logistic regression. When actual is 0, r[0]
> remains 0, and when actual is 1, r[0] gets set to 1. I'm not sure if I'm
> understanding it correctly. For now, I've included DefaultGradientLinear in
> the patch as a work around. If you could give me some advice, that'd be
> helpful.
>
>
> 2.
> As I'm working on the sample code TrainLinear, I was referring to
> TrainLogistic code. I'm confused with this line:
>          int targetValue = csv.processLine(line, input);
>
> The training file is:
> "a","b","c","target"
> 3,1,10,1
> 2,1,10,1
> 1,0,2,0
> ...
>
> But the output for processLine() is:
> Line 1: targetValue = 0, input = {2:4.0, 1:10.0, 0:1.0}
> Line 2: targetValue = 0, input = {2:3.0, 1:10.0, 0:1.0}
> Line 3: targetValue = 1, input = {2:1.0, 1:2.0, 0:1.0}
> ...
>
> It seems the target values are inverted, and some input values are
> incremented. It'd be great if you could explain the processLine() a little
> bit.
>
> btw, is the mail list a good place for implementation discussion or should
> it take place on the JIRA page?
>
> Thanks
>
>
> On Wed, Oct 20, 2010 at 9:58 PM, Ted Dunning <te...@gmail.com>wrote:
>
>> You don't have to apply the age correction to old data until you display
>> the
>> data.  The trick is to store all of the fixed components
>> of the rating in linear form and then add only the age related terms at
>> display time.  This allows you to penalize items that are unlikely to be
>> relevant due to age and doesn't require any recomputation.
>>
>> On Wed, Oct 20, 2010 at 9:32 PM, Frank Wang <wa...@gmail.com> wrote:
>>
>> > Hi Ted,
>> >
>> > I've created the JIRA issue at
>> > https://issues.apache.org/jira/browse/MAHOUT-529, will attach what i
>> have
>> > soon.
>> >
>> > Do you mean using time as a feature in the logistic regression? I
>> thought
>> > about your suggestion the other day, but I'm not re-calculating the
>> > probability on the old data. After training each night, we only apply
>> the
>> > coefficients on next day's new data. I'm not quite sure how would the
>> decay
>> > function work in this case. Do you have an example?
>> >
>> > Thanks
>> >
>> >
>> > On Wed, Oct 20, 2010 at 8:48 PM, Ted Dunning <te...@gmail.com>
>> > wrote:
>> >
>> > > Can you open a JIRA and attach a patch.
>> > >
>> > > Your approach seems reasonable so far for the regression.
>> > >
>> > > In terms of how it could be applied, it seems like you are trying to
>> > > estimate a life-span for a posting to model relevance decay.
>> > >
>> > > My own preference there would be to try to estimate relevance (0 or 1)
>> > > using
>> > > logistic regression and then put in various decay functions in as
>> > features.
>> > >  The weighted sum of those decay functions is your time decay of
>> > relevance
>> > > (in log-odds).
>> > >
>> > > My initial shot at decay functions would include age, square of age
>> and
>> > log
>> > > of age.  My guess is that direct age would suffice because of the
>> > logistic
>> > > link function which looks like a logarithmic function where your
>> models
>> > > will
>> > > probably live.
>> > >
>> > > On Wed, Oct 20, 2010 at 8:15 PM, Frank Wang <wa...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Ted,
>> > > >
>> > > > thanks for your reply.
>> > > > I'm trying a new model where I want to estimate the output as a
>> > timespan
>> > > > quantified in number of seconds, which is not bounded. That's why I
>> > think
>> > > > I'd use linear regression instead of logistic regression. (lemme
>> know
>> > if
>> > > > i'm
>> > > > wrong)
>> > > >
>> > > > I started on the code yesterday. The new
>> AbstractOnlineLinearRegression
>> > > > class is implementing the OnlineLearner interface. I updated the
>> > > classify()
>> > > > function to use linear model. I tried to follow the format for
>> > > > AbstractOnlineLogisticRegression.
>> > > >
>> > > > I think since linear regression can be implemented w/ sgd, the
>> train()
>> > > > and regularize() functions would look similar. I'm not sure if i'm
>> on
>> > the
>> > > > right path. Any advice would be helpful.
>> > > >
>> > > > Thanks
>> > > >
>> > > > On Wed, Oct 20, 2010 at 3:34 PM, Ted Dunning <ted.dunning@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Frank,
>> > > > >
>> > > > > Sorry I didn't answer your previous email regarding this.
>> > > > >
>> > > > > It sounded to me like your application would actually be happier
>> with
>> > a
>> > > > > form
>> > > > > of logistic regression.
>> > > > >
>> > > > > Perhaps we should talk some more about this on the list.
>> > > > >
>> > > > > If you want a normal linear regression, the current OnlineLearner
>> > > > interface
>> > > > > isn't terribly appropriate since it assumes a 1 of n vector target
>> > > > > variable.
>> > > > >
>> > > > > If you were to extend that interface to accept a vector form of
>> > target
>> > > > > variable then linear regression would work (and some clever tricks
>> > > would
>> > > > > become possible for logistic regression).
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Oct 20, 2010 at 1:57 PM, Frank Wang <wangfanjie@gmail.com
>> >
>> > > > wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > I'm interested in implementing Linear Regression in Mahout. Who
>> > would
>> > > > be
>> > > > > > the
>> > > > > > point person for the algorithm? I'd love to discuss the
>> > > implementation
>> > > > > > details, or to help out if anyone is working on it already :)
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Implementation for Linear Regression

Posted by Ted Dunning <te...@gmail.com>.

This is common for any online learning algorithm.

Normally a declining learning rate is used.  This is called annealing.  If
the learning rate starts too large but anneals reasonably quickly, you wind
up wasting some data and have to recover from some crazy coefficients, but
it generally works rather well.

On Wed, Nov 10, 2010 at 1:26 AM, Frank Wang <wa...@gmail.com> wrote:

> With linear regression, it seems that the coefficients tend to grow
> unboundedly when using a larger learning rate, ie. --rate 50.
> Works fine when i keep the rate < 1.
>
> Is this a normal characteristics for Linear Regression?
>
> On Fri, Oct 22, 2010 at 1:08 AM, Frank Wang <wa...@gmail.com> wrote:
>
> > Thanks Ted.
> >
> > It's a very interesting solution. Currently, we need to account for age
> > related terms when calculating the relevance ranking, and this is done
> > before display time. We will play around with our data and see if we can
> > model our data to leverage on the trick.
> >
> > In terms of Linear Regression, I've attached the initial patch on
> > MAHOUT-529 <https://issues.apache.org/jira/browse/MAHOUT-529>. It's
> mainly
> > the AbstractOnlineLinearRegression and OnlineLinearRegression classes.
> Lemme
> > know if the code makes sense.
> >
> > I have 2 questions:
> >
> > 1.
> > The apply() function in DefaultGradient has:
> >     Vector r = v.like();
> >     if (actual != 0) {
> >       r.setQuick(actual - 1, 1);
> >     }
> >
> > The code seems to work only for logistic regression. When actual is 0,
> r[0]
> > remains 0, and when actual is 1, r[0] gets set to 1. I'm not sure if I'm
> > understanding it correctly. For now, I've included DefaultGradientLinear
> in
> > the patch as a work around. If you could give me some advice, that'd be
> > helpful.
> >
> >
> > 2.
> > As I'm working on the sample code TrainLinear, I was referring to
> > TrainLogistic code. I'm confused with this line:
> >          int targetValue = csv.processLine(line, input);
> >
> > The training file is:
> > "a","b","c","target"
> > 3,1,10,1
> > 2,1,10,1
> > 1,0,2,0
> > ...
> >
> > But the output for processLine() is:
> > Line 1: targetValue = 0, input = {2:4.0, 1:10.0, 0:1.0}
> > Line 2: targetValue = 0, input = {2:3.0, 1:10.0, 0:1.0}
> > Line 3: targetValue = 1, input = {2:1.0, 1:2.0, 0:1.0}
> > ...
> >
> > It seems the target values are inverted, and some input values are
> > incremented. It'd be great if you could explain the processLine() a
> little
> > bit.
> >
> > btw, is the mail list a good place for implementation discussion or
> should
> > it take place on the JIRA page?
> >
> > Thanks
> >
> >
> > On Wed, Oct 20, 2010 at 9:58 PM, Ted Dunning <ted.dunning@gmail.com
> >wrote:
> >
> >> You don't have to apply the age correction to old data until you display
> >> the
> >> data.  The trick is to store all of the fixed components
> >> of the rating in linear form and then add only the age related terms at
> >> display time.  This allows you to penalize items that are unlikely to be
> >> relevant due to age and doesn't require any recomputation.
> >>
> >> On Wed, Oct 20, 2010 at 9:32 PM, Frank Wang <wa...@gmail.com>
> wrote:
> >>
> >> > Hi Ted,
> >> >
> >> > I've created the JIRA issue at
> >> > https://issues.apache.org/jira/browse/MAHOUT-529, will attach what i
> >> have
> >> > soon.
> >> >
> >> > Do you mean using time as a feature in the logistic regression? I
> >> thought
> >> > about your suggestion the other day, but I'm not re-calculating the
> >> > probability on the old data. After training each night, we only apply
> >> the
> >> > coefficients on next day's new data. I'm not quite sure how would the
> >> decay
> >> > function work in this case. Do you have an example?
> >> >
> >> > Thanks
> >> >
> >> >
> >> > On Wed, Oct 20, 2010 at 8:48 PM, Ted Dunning <te...@gmail.com>
> >> > wrote:
> >> >
> >> > > Can you open a JIRA and attach a patch.
> >> > >
> >> > > Your approach seems reasonable so far for the regression.
> >> > >
> >> > > In terms of how it could be applied, it seems like you are trying to
> >> > > estimate a life-span for a posting to model relevance decay.
> >> > >
> >> > > My own preference there would be to try to estimate relevance (0 or
> 1)
> >> > > using
> >> > > logistic regression and then put in various decay functions in as
> >> > features.
> >> > >  The weighted sum of those decay functions is your time decay of
> >> > relevance
> >> > > (in log-odds).
> >> > >
> >> > > My initial shot at decay functions would include age, square of age
> >> and
> >> > log
> >> > > of age.  My guess is that direct age would suffice because of the
> >> > logistic
> >> > > link function which looks like a logarithmic function where your
> >> models
> >> > > will
> >> > > probably live.
> >> > >
> >> > > On Wed, Oct 20, 2010 at 8:15 PM, Frank Wang <wa...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Ted,
> >> > > >
> >> > > > thanks for your reply.
> >> > > > I'm trying a new model where I want to estimate the output as a
> >> > timespan
> >> > > > quantified in number of seconds, which is not bounded. That's why
> I
> >> > think
> >> > > > I'd use linear regression instead of logistic regression. (lemme
> >> know
> >> > if
> >> > > > i'm
> >> > > > wrong)
> >> > > >
> >> > > > I started on the code yesterday. The new
> >> AbstractOnlineLinearRegression
> >> > > > class is implementing the OnlineLearner interface. I updated the
> >> > > classify()
> >> > > > function to use linear model. I tried to follow the format for
> >> > > > AbstractOnlineLogisticRegression.
> >> > > >
> >> > > > I think since linear regression can be implemented w/ sgd, the
> >> train()
> >> > > > and regularize() functions would look similar. I'm not sure if i'm
> >> on
> >> > the
> >> > > > right path. Any advice would be helpful.
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > > On Wed, Oct 20, 2010 at 3:34 PM, Ted Dunning <
> ted.dunning@gmail.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Frank,
> >> > > > >
> >> > > > > Sorry I didn't answer your previous email regarding this.
> >> > > > >
> >> > > > > It sounded to me like your application would actually be happier
> >> with
> >> > a
> >> > > > > form
> >> > > > > of logistic regression.
> >> > > > >
> >> > > > > Perhaps we should talk some more about this on the list.
> >> > > > >
> >> > > > > If you want a normal linear regression, the current
> OnlineLearner
> >> > > > interface
> >> > > > > isn't terribly appropriate since it assumes a 1 of n vector
> target
> >> > > > > variable.
> >> > > > >
> >> > > > > If you were to extend that interface to accept a vector form of
> >> > target
> >> > > > > variable then linear regression would work (and some clever
> tricks
> >> > > would
> >> > > > > become possible for logistic regression).
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Oct 20, 2010 at 1:57 PM, Frank Wang <
> wangfanjie@gmail.com
> >> >
> >> > > > wrote:
> >> > > > >
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I'm interested in implementing Linear Regression in Mahout.
> Who
> >> > would
> >> > > > be
> >> > > > > > the
> >> > > > > > point person for the algorithm? I'd love to discuss the
> >> > > implementation
> >> > > > > > details, or to help out if anyone is working on it already :)
> >> > > > > >
> >> > > > > > Thanks
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>