You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2011/12/16 19:05:29 UTC

RE: ALS-WR and reg rate discussion

Hi,

I remember vaguely the discussion of finding the optimum for reg rate
in ALS-WR stuff.

Would it make sense to take a subsample (or, rather, a random
submatrix) of the original input and try to find optimum for it
somehow, similar to total order paritioner's distribution sampling?

I have put ALS with regularization and ALS-WR  (and will put the
implicit feedback paper as well) into R code and i was wondering if it
makes sense to find a better guess for lambda by just doing an R
simulation on a randomly subsampled data before putting it into
pipeline? or there's a fundamental problem with this approach?

Thanks.
-Dmitriy

Re: ALS-WR and reg rate discussion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

OK,

yet another crazy idea of mine.

Generally, we we coerce to classical SVD form with singular values,
then Tikhonov regularization can be probably optimized
post-decomposition. Indeed, i can see no reason why we can't control
the smoothing at the prediction stage by hacking the predictor as in
following. Consequently, if we can, then we can also optimize degree
of smoothiness on hold out data after decomposition is done, perhaps
even cross-fold. but we don't have to rerun it again and again.
Finally, a hack for ALS-WR is enclosed too, but in this case this is
more intuitive than derived. i guess i can try it out in R.

It is still possible that it is a complete nonsense of course.

https://docs.google.com/open?id=0B883AxfQlYWANDllNWQ1ZDQtOTEzOS00MWM3LWI4MjItNDQ3MDg2ZWMzMmE3

Re: ALS-WR and reg rate discussion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

i just suspect there must have been some research or study done in
terms of how accurate factorization problems are on a sumbsample.
Similar to standard errors and confidence intervals. e.g. i know how
many samples i need to fit observed mean into certain confidence
interval provided i know original distribution . So similar estimate
is sought for a factorization problem, assuming some standard mixture
model.

-d

On Fri, Dec 16, 2011 at 10:56 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> the problem is convex but the idea is not to use a map reduce but a
> subsample and solve it in memory on a reduced sample (i was actually
> thinking of simple bisect rather than trying to fit to anything), but
> that's not the point .
>
> the point is how accurate the solution for a random subsample would
> reflect the actual optimum on the whole.
>
>
>
> On Fri, Dec 16, 2011 at 10:50 AM, Raphael Cendrillon
> <ce...@gmail.com> wrote:
>> Hi Dmitry,
>>
>> I have a feeling the objective may be very close to convex. In that case there are faster approaches than random subsampling.
>>
>> A common strategy for example is to fit a quadratic onto the previously evaluated lambda values, and then solve it for the minimum.
>>
>> This is an iterative approach, so wouldn't fit well within map reduce, but if you are thinking of doing this as a preprocessing step it would be OK.
>>
>> On Dec 16, 2011, at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I remember vaguely the discussion of finding the optimum for reg rate
>>> in ALS-WR stuff.
>>>
>>> Would it make sense to take a subsample (or, rather, a random
>>> submatrix) of the original input and try to find optimum for it
>>> somehow, similar to total order paritioner's distribution sampling?
>>>
>>> I have put ALS with regularization and ALS-WR  (and will put the
>>> implicit feedback paper as well) into R code and i was wondering if it
>>> makes sense to find a better guess for lambda by just doing an R
>>> simulation on a randomly subsampled data before putting it into
>>> pipeline? or there's a fundamental problem with this approach?
>>>
>>> Thanks.
>>> -Dmitriy

Re: ALS-WR and reg rate discussion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

the problem is convex but the idea is not to use a map reduce but a
subsample and solve it in memory on a reduced sample (i was actually
thinking of simple bisect rather than trying to fit to anything), but
that's not the point .

the point is how accurate the solution for a random subsample would
reflect the actual optimum on the whole.



On Fri, Dec 16, 2011 at 10:50 AM, Raphael Cendrillon
<ce...@gmail.com> wrote:
> Hi Dmitry,
>
> I have a feeling the objective may be very close to convex. In that case there are faster approaches than random subsampling.
>
> A common strategy for example is to fit a quadratic onto the previously evaluated lambda values, and then solve it for the minimum.
>
> This is an iterative approach, so wouldn't fit well within map reduce, but if you are thinking of doing this as a preprocessing step it would be OK.
>
> On Dec 16, 2011, at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> Hi,
>>
>> I remember vaguely the discussion of finding the optimum for reg rate
>> in ALS-WR stuff.
>>
>> Would it make sense to take a subsample (or, rather, a random
>> submatrix) of the original input and try to find optimum for it
>> somehow, similar to total order paritioner's distribution sampling?
>>
>> I have put ALS with regularization and ALS-WR  (and will put the
>> implicit feedback paper as well) into R code and i was wondering if it
>> makes sense to find a better guess for lambda by just doing an R
>> simulation on a randomly subsampled data before putting it into
>> pipeline? or there's a fundamental problem with this approach?
>>
>> Thanks.
>> -Dmitriy

Re: ALS-WR and reg rate discussion

Posted by Ted Dunning <te...@gmail.com>.

Not a bad idea at all.  The objective function is probably very
asymmetrical when expressed with the value lambda.  Transforming lambda
might help with that.  The asymmetry shouldn't be all that big a deal if
you put a constrained 1-d optimizer on the problem.

On Fri, Dec 16, 2011 at 10:50 AM, Raphael Cendrillon <
cendrillon1978@gmail.com> wrote:

> Hi Dmitry,
>
> I have a feeling the objective may be very close to convex. In that case
> there are faster approaches than random subsampling.
>
> A common strategy for example is to fit a quadratic onto the previously
> evaluated lambda values, and then solve it for the minimum.
>
> This is an iterative approach, so wouldn't fit well within map reduce, but
> if you are thinking of doing this as a preprocessing step it would be OK.
>
> On Dec 16, 2011, at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
> > Hi,
> >
> > I remember vaguely the discussion of finding the optimum for reg rate
> > in ALS-WR stuff.
> >
> > Would it make sense to take a subsample (or, rather, a random
> > submatrix) of the original input and try to find optimum for it
> > somehow, similar to total order paritioner's distribution sampling?
> >
> > I have put ALS with regularization and ALS-WR  (and will put the
> > implicit feedback paper as well) into R code and i was wondering if it
> > makes sense to find a better guess for lambda by just doing an R
> > simulation on a randomly subsampled data before putting it into
> > pipeline? or there's a fundamental problem with this approach?
> >
> > Thanks.
> > -Dmitriy
>

Re: ALS-WR and reg rate discussion

Posted by Raphael Cendrillon <ce...@gmail.com>.

Hi Dmitry,

I have a feeling the objective may be very close to convex. In that case there are faster approaches than random subsampling.

A common strategy for example is to fit a quadratic onto the previously evaluated lambda values, and then solve it for the minimum.

This is an iterative approach, so wouldn't fit well within map reduce, but if you are thinking of doing this as a preprocessing step it would be OK. 

On Dec 16, 2011, at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Hi,
> 
> I remember vaguely the discussion of finding the optimum for reg rate
> in ALS-WR stuff.
> 
> Would it make sense to take a subsample (or, rather, a random
> submatrix) of the original input and try to find optimum for it
> somehow, similar to total order paritioner's distribution sampling?
> 
> I have put ALS with regularization and ALS-WR  (and will put the
> implicit feedback paper as well) into R code and i was wondering if it
> makes sense to find a better guess for lambda by just doing an R
> simulation on a randomly subsampled data before putting it into
> pipeline? or there's a fundamental problem with this approach?
> 
> Thanks.
> -Dmitriy

Re: ALS-WR and reg rate discussion

Posted by Ted Dunning <te...@gmail.com>.

Sort of kind of, but it is hard to extrapolate over a size range of >10 and
that is the scale difference we are talking about.



On Fri, Dec 16, 2011 at 11:44 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> and there's no way to estimate a difference  for a bigger sample?
>
> On Fri, Dec 16, 2011 at 11:37 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > This doesn't work because the correct value for a sub-sampled batch will
> be
> > smaller than for a full data set.
> >
> > On Fri, Dec 16, 2011 at 10:05 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >wrote:
> >
> >> if it
> >> makes sense to find a better guess for lambda by just doing an R
> >> simulation on a randomly subsampled data before putting it into
> >> pipeline? or there's a fundamental problem with this approach?
> >>
>

Re: ALS-WR and reg rate discussion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

and there's no way to estimate a difference  for a bigger sample?

On Fri, Dec 16, 2011 at 11:37 AM, Ted Dunning <te...@gmail.com> wrote:
> This doesn't work because the correct value for a sub-sampled batch will be
> smaller than for a full data set.
>
> On Fri, Dec 16, 2011 at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
>
>> if it
>> makes sense to find a better guess for lambda by just doing an R
>> simulation on a randomly subsampled data before putting it into
>> pipeline? or there's a fundamental problem with this approach?
>>

Re: ALS-WR and reg rate discussion

Posted by Ted Dunning <te...@gmail.com>.

This doesn't work because the correct value for a sub-sampled batch will be
smaller than for a full data set.

On Fri, Dec 16, 2011 at 10:05 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> if it
> makes sense to find a better guess for lambda by just doing an R
> simulation on a randomly subsampled data before putting it into
> pipeline? or there's a fundamental problem with this approach?
>