You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2011/05/25 02:12:39 UTC

LatentLogLinear code

Ted,

so you do in-memory latent factor computation? I think this is the
same technique Koren implied for learning latent factors.

However, i never understood why this factorization must come up with r
best factors. I understand incremental SVD approach
(essentially the same thing except learning factors iteratively
guarantees we capture the best ones) but if we do it all in parallel,
does it create any good in your trials?

also i thought that cold start problem is helped by the fact that we
learn weights first so they always give independent best
approximation, and then user-item interactions reveal specific about
user and item. However, if we learn them all at the same time, it does
not seem obvious to me that we'd be learning best approximation when
latent factors are unkown
(new users). Also, in that implementation i can't see side info
training at all -- is it there?

thanks.
-D

Re: LatentLogLinear code

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

yes i was asking about lll branch in your repo.

-d

On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com> wrote:
> Heh?
>
> On Tue, May 24, 2011 at 5:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> so you do in-memory latent factor computation? I think this is the
>> same technique Koren implied for learning latent factors.
>>
>
> Are you referring to the Log linear latent factor code that I have in my
> mahout-525 github repo?
>
> Or SVD (L-2 latent factor computations)?
>
> Or LDA (multinomial latent factor computation)?
>
> Or Dirichlet process clustering?
>
> Or ...?
>
>
>> However, i never understood why this factorization must come up with r
>> best factors. I understand incremental SVD approach
>> (essentially the same thing except learning factors iteratively
>> guarantees we capture the best ones) but if we do it all in parallel,
>> does it create any good in your trials?
>>
>
> I don't understand the question.
>
> Are you asking whether the random projection code finds the best (largest)
> singular
> values and corresponding vectors?  If so, the answer is yes, it does with
> high probability
> of low error.
>
> But this doesn't sound like LLL stuff.
>
>
>> also i thought that cold start problem is helped by the fact that we
>> learn weights first so they always give independent best
>> approximation, and then user-item interactions reveal specific about
>> user and item. However, if we learn them all at the same time, it does
>> not seem obvious to me that we'd be learning best approximation when
>> latent factors are unkown
>> (new users). Also, in that implementation i can't see side info
>> training at all -- is it there?
>>
>
> This sounds like LLL again.
>
> In LLL, the optimization of side information coefficients and the
> optimization of the user-item interactions are separately convex, but not
> jointly convex.  This means that you pretty much have to proceed by
> optimizing one, then the other, then the first and so on.
>
> So I don't see how we are learning them all at the same time.
>
> You are correct, I think that having lost joint convexity that we don't have
> strong convergence guarantees, but practically speaking, we get
> convergence.
>
> Regarding the side information, I thought it was there but may have made a
> mistake.
>
> Sorry for being dense about your message.
>

Re: LatentLogLinear code

Posted by Ted Dunning <te...@gmail.com>.

If you minimize squared error of a reconstruction, then you inherently have
orthogonal vectors
(for the first several, anyway).

It doesn't matter what order you do it in if you actually reach the
quadratic optimum.

The SGD approach will have issues with the order of the vectors produced.

On Tue, May 24, 2011 at 6:13 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Or incremental SVD provides orthogonality of the singular vectors
> while LLL does not? (my best guess why they do it incrementally).
>
> On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> > Thanks, Ted.
> >
> > On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >> Heh?
> >>
> >> Are you referring to the Log linear latent factor code that I have in my
> >> mahout-525 github repo?
> >>
> >
> > I am referring to LatentLogLinear class in your repo under lll branch.
> >
> >>
> >>
> >>> However, i never understood why this factorization must come up with r
> >>> best factors. I understand incremental SVD approach
> >>> (essentially the same thing except learning factors iteratively
> >>> guarantees we capture the best ones) but if we do it all in parallel,
> >>> does it create any good in your trials?
> >>>
> >>
> >> I don't understand the question.
> >>
> >> Are you asking whether the random projection code finds the best
> (largest)
> >> singular
> >> values and corresponding vectors?  If so, the answer is yes, it does
> with
> >> high probability
> >> of low error.
> >>
> >
> > Well you have alternating scheme there, right? you do learn left
> > singular vectors, then you switch, find the right singular vectors,
> > but as far as i can tell you are not doing it the same way as
> > incremental SVD does
> >
> > Incremental SVD goes thru the entire dataset the same way but only for
> > 1 factor first. then it frozes it once testing rmse curve is flat and
> > starts doing the same for the second one. Intuitively it's clear that
> > the first pass this way finds the largest factor and the next one
> > finds the next largest etc. Hence there's a 'step' curve on RMSE chart
> > for this process as it switches from factor to factor.
> >
> > But in your case, it looks like you are learning all the factors at
> > once. Is it going to result into the same result as incremental SVD
> > algorithm? if yes, why did they even do it incrementally, for it's
> > clear incremental approach would require more iterations?
> >
> > (there's a mahout issue for incremental svd implementation btw).
> >
> >>
> >> Regarding the side information, I thought it was there but may have made
> a
> >> mistake.
> >>
> >
> > which branch should i look at for the latest code? i looked at LLL
> branch.
> > thanks.
> >
>

Re: LatentLogLinear code

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Ok, thanks. Just wanted to confirm that it is a good practice. Sounds
like it is.



On Tue, May 24, 2011 at 6:21 PM, Ted Dunning <te...@gmail.com> wrote:
> I couldn't possible comment.  If you were actually achieving the minimum,
> the results
> should have been the same subject to order and sign changes.
>
> On Tue, May 24, 2011 at 6:15 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> the reason i am asking is because i actually did that very scheme some
>> time ago and for whatever reason i got very mixed results. I assumed
>> that was because i better be doing it incrementally.
>>
>>
>> On Tue, May 24, 2011 at 6:13 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> > Or incremental SVD provides orthogonality of the singular vectors
>> > while LLL does not? (my best guess why they do it incrementally).
>> >
>> > On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> >> Thanks, Ted.
>> >>
>> >> On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >>> Heh?
>> >>>
>> >>> Are you referring to the Log linear latent factor code that I have in
>> my
>> >>> mahout-525 github repo?
>> >>>
>> >>
>> >> I am referring to LatentLogLinear class in your repo under lll branch.
>> >>
>> >>>
>> >>>
>> >>>> However, i never understood why this factorization must come up with r
>> >>>> best factors. I understand incremental SVD approach
>> >>>> (essentially the same thing except learning factors iteratively
>> >>>> guarantees we capture the best ones) but if we do it all in parallel,
>> >>>> does it create any good in your trials?
>> >>>>
>> >>>
>> >>> I don't understand the question.
>> >>>
>> >>> Are you asking whether the random projection code finds the best
>> (largest)
>> >>> singular
>> >>> values and corresponding vectors?  If so, the answer is yes, it does
>> with
>> >>> high probability
>> >>> of low error.
>> >>>
>> >>
>> >> Well you have alternating scheme there, right? you do learn left
>> >> singular vectors, then you switch, find the right singular vectors,
>> >> but as far as i can tell you are not doing it the same way as
>> >> incremental SVD does
>> >>
>> >> Incremental SVD goes thru the entire dataset the same way but only for
>> >> 1 factor first. then it frozes it once testing rmse curve is flat and
>> >> starts doing the same for the second one. Intuitively it's clear that
>> >> the first pass this way finds the largest factor and the next one
>> >> finds the next largest etc. Hence there's a 'step' curve on RMSE chart
>> >> for this process as it switches from factor to factor.
>> >>
>> >> But in your case, it looks like you are learning all the factors at
>> >> once. Is it going to result into the same result as incremental SVD
>> >> algorithm? if yes, why did they even do it incrementally, for it's
>> >> clear incremental approach would require more iterations?
>> >>
>> >> (there's a mahout issue for incremental svd implementation btw).
>> >>
>> >>>
>> >>> Regarding the side information, I thought it was there but may have
>> made a
>> >>> mistake.
>> >>>
>> >>
>> >> which branch should i look at for the latest code? i looked at LLL
>> branch.
>> >> thanks.
>> >>
>> >
>>
>

Re: LatentLogLinear code

Posted by Ted Dunning <te...@gmail.com>.

I couldn't possible comment.  If you were actually achieving the minimum,
the results
should have been the same subject to order and sign changes.

On Tue, May 24, 2011 at 6:15 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> the reason i am asking is because i actually did that very scheme some
> time ago and for whatever reason i got very mixed results. I assumed
> that was because i better be doing it incrementally.
>
>
> On Tue, May 24, 2011 at 6:13 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> > Or incremental SVD provides orthogonality of the singular vectors
> > while LLL does not? (my best guess why they do it incrementally).
> >
> > On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >> Thanks, Ted.
> >>
> >> On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>> Heh?
> >>>
> >>> Are you referring to the Log linear latent factor code that I have in
> my
> >>> mahout-525 github repo?
> >>>
> >>
> >> I am referring to LatentLogLinear class in your repo under lll branch.
> >>
> >>>
> >>>
> >>>> However, i never understood why this factorization must come up with r
> >>>> best factors. I understand incremental SVD approach
> >>>> (essentially the same thing except learning factors iteratively
> >>>> guarantees we capture the best ones) but if we do it all in parallel,
> >>>> does it create any good in your trials?
> >>>>
> >>>
> >>> I don't understand the question.
> >>>
> >>> Are you asking whether the random projection code finds the best
> (largest)
> >>> singular
> >>> values and corresponding vectors?  If so, the answer is yes, it does
> with
> >>> high probability
> >>> of low error.
> >>>
> >>
> >> Well you have alternating scheme there, right? you do learn left
> >> singular vectors, then you switch, find the right singular vectors,
> >> but as far as i can tell you are not doing it the same way as
> >> incremental SVD does
> >>
> >> Incremental SVD goes thru the entire dataset the same way but only for
> >> 1 factor first. then it frozes it once testing rmse curve is flat and
> >> starts doing the same for the second one. Intuitively it's clear that
> >> the first pass this way finds the largest factor and the next one
> >> finds the next largest etc. Hence there's a 'step' curve on RMSE chart
> >> for this process as it switches from factor to factor.
> >>
> >> But in your case, it looks like you are learning all the factors at
> >> once. Is it going to result into the same result as incremental SVD
> >> algorithm? if yes, why did they even do it incrementally, for it's
> >> clear incremental approach would require more iterations?
> >>
> >> (there's a mahout issue for incremental svd implementation btw).
> >>
> >>>
> >>> Regarding the side information, I thought it was there but may have
> made a
> >>> mistake.
> >>>
> >>
> >> which branch should i look at for the latest code? i looked at LLL
> branch.
> >> thanks.
> >>
> >
>

Re: LatentLogLinear code

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

the reason i am asking is because i actually did that very scheme some
time ago and for whatever reason i got very mixed results. I assumed
that was because i better be doing it incrementally.


On Tue, May 24, 2011 at 6:13 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Or incremental SVD provides orthogonality of the singular vectors
> while LLL does not? (my best guess why they do it incrementally).
>
> On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> Thanks, Ted.
>>
>> On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com> wrote:
>>> Heh?
>>>
>>> Are you referring to the Log linear latent factor code that I have in my
>>> mahout-525 github repo?
>>>
>>
>> I am referring to LatentLogLinear class in your repo under lll branch.
>>
>>>
>>>
>>>> However, i never understood why this factorization must come up with r
>>>> best factors. I understand incremental SVD approach
>>>> (essentially the same thing except learning factors iteratively
>>>> guarantees we capture the best ones) but if we do it all in parallel,
>>>> does it create any good in your trials?
>>>>
>>>
>>> I don't understand the question.
>>>
>>> Are you asking whether the random projection code finds the best (largest)
>>> singular
>>> values and corresponding vectors?  If so, the answer is yes, it does with
>>> high probability
>>> of low error.
>>>
>>
>> Well you have alternating scheme there, right? you do learn left
>> singular vectors, then you switch, find the right singular vectors,
>> but as far as i can tell you are not doing it the same way as
>> incremental SVD does
>>
>> Incremental SVD goes thru the entire dataset the same way but only for
>> 1 factor first. then it frozes it once testing rmse curve is flat and
>> starts doing the same for the second one. Intuitively it's clear that
>> the first pass this way finds the largest factor and the next one
>> finds the next largest etc. Hence there's a 'step' curve on RMSE chart
>> for this process as it switches from factor to factor.
>>
>> But in your case, it looks like you are learning all the factors at
>> once. Is it going to result into the same result as incremental SVD
>> algorithm? if yes, why did they even do it incrementally, for it's
>> clear incremental approach would require more iterations?
>>
>> (there's a mahout issue for incremental svd implementation btw).
>>
>>>
>>> Regarding the side information, I thought it was there but may have made a
>>> mistake.
>>>
>>
>> which branch should i look at for the latest code? i looked at LLL branch.
>> thanks.
>>
>

Re: LatentLogLinear code

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Or incremental SVD provides orthogonality of the singular vectors
while LLL does not? (my best guess why they do it incrementally).

On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Thanks, Ted.
>
> On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com> wrote:
>> Heh?
>>
>> Are you referring to the Log linear latent factor code that I have in my
>> mahout-525 github repo?
>>
>
> I am referring to LatentLogLinear class in your repo under lll branch.
>
>>
>>
>>> However, i never understood why this factorization must come up with r
>>> best factors. I understand incremental SVD approach
>>> (essentially the same thing except learning factors iteratively
>>> guarantees we capture the best ones) but if we do it all in parallel,
>>> does it create any good in your trials?
>>>
>>
>> I don't understand the question.
>>
>> Are you asking whether the random projection code finds the best (largest)
>> singular
>> values and corresponding vectors?  If so, the answer is yes, it does with
>> high probability
>> of low error.
>>
>
> Well you have alternating scheme there, right? you do learn left
> singular vectors, then you switch, find the right singular vectors,
> but as far as i can tell you are not doing it the same way as
> incremental SVD does
>
> Incremental SVD goes thru the entire dataset the same way but only for
> 1 factor first. then it frozes it once testing rmse curve is flat and
> starts doing the same for the second one. Intuitively it's clear that
> the first pass this way finds the largest factor and the next one
> finds the next largest etc. Hence there's a 'step' curve on RMSE chart
> for this process as it switches from factor to factor.
>
> But in your case, it looks like you are learning all the factors at
> once. Is it going to result into the same result as incremental SVD
> algorithm? if yes, why did they even do it incrementally, for it's
> clear incremental approach would require more iterations?
>
> (there's a mahout issue for incremental svd implementation btw).
>
>>
>> Regarding the side information, I thought it was there but may have made a
>> mistake.
>>
>
> which branch should i look at for the latest code? i looked at LLL branch.
> thanks.
>

Re: LatentLogLinear code

Posted by Ted Dunning <te...@gmail.com>.

LLL branch is correct.

On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> > Regarding the side information, I thought it was there but may have made
> a
> > mistake.
> >
>
> which branch should i look at for the latest code? i looked at LLL branch.
> thanks.

Re: LatentLogLinear code

Posted by Ted Dunning <te...@gmail.com>.

On Tue, May 24, 2011 at 6:11 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> > Are you asking whether the random projection code finds the best
> (largest)
> > singular
> > values and corresponding vectors?  If so, the answer is yes, it does with
> > high probability
> > of low error.
> >
>
> Well you have alternating scheme there, right? you do learn left
> singular vectors, then you switch, find the right singular vectors,
> but as far as i can tell you are not doing it the same way as
> incremental SVD does
>

Well, I should be alternating between the side information and the latent
factors.

This is not really an SVD implementation at all.  It is a log linear latent
variable implementation.

Whether it will reach the same results isn't clear to me.  The SGD
implementation that it uses
should definitely converge, however.

There are analogies with SVD computation of course.

>
> Incremental SVD goes thru the entire dataset the same way but only for
> 1 factor first. then it frozes it once testing rmse curve is flat and
> starts doing the same for the second one. Intuitively it's clear that
> the first pass this way finds the largest factor and the next one
> finds the next largest etc. Hence there's a 'step' curve on RMSE chart
> for this process as it switches from factor to factor.
>

Actually it isn't even clear that this converges to the correct eigenvectors
if two eigenvalues
are nearly the same size.

Regardless, the SGD algorithm is basically just a matter of inverting the
loops.  Instead of
looping first on eigenvector, then on examples, I iterated on examples, then
on eigenvector.

They should be similar.

>
> But in your case, it looks like you are learning all the factors at
> once. Is it going to result into the same result as incremental SVD
> algorithm? if yes, why did they even do it incrementally, for it's
> clear incremental approach would require more iterations?
>

Nothing is all that clear with these algorithms.  Speed varies more with
memory bandwidth than
with number operations.  My guess is that the SGD algorithm is a bit better
in this regard.

Re: LatentLogLinear code

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Thanks, Ted.

On Tue, May 24, 2011 at 5:44 PM, Ted Dunning <te...@gmail.com> wrote:
> Heh?
>
> Are you referring to the Log linear latent factor code that I have in my
> mahout-525 github repo?
>

I am referring to LatentLogLinear class in your repo under lll branch.

>
>
>> However, i never understood why this factorization must come up with r
>> best factors. I understand incremental SVD approach
>> (essentially the same thing except learning factors iteratively
>> guarantees we capture the best ones) but if we do it all in parallel,
>> does it create any good in your trials?
>>
>
> I don't understand the question.
>
> Are you asking whether the random projection code finds the best (largest)
> singular
> values and corresponding vectors?  If so, the answer is yes, it does with
> high probability
> of low error.
>

Well you have alternating scheme there, right? you do learn left
singular vectors, then you switch, find the right singular vectors,
but as far as i can tell you are not doing it the same way as
incremental SVD does

Incremental SVD goes thru the entire dataset the same way but only for
1 factor first. then it frozes it once testing rmse curve is flat and
starts doing the same for the second one. Intuitively it's clear that
the first pass this way finds the largest factor and the next one
finds the next largest etc. Hence there's a 'step' curve on RMSE chart
for this process as it switches from factor to factor.

But in your case, it looks like you are learning all the factors at
once. Is it going to result into the same result as incremental SVD
algorithm? if yes, why did they even do it incrementally, for it's
clear incremental approach would require more iterations?

(there's a mahout issue for incremental svd implementation btw).

>
> Regarding the side information, I thought it was there but may have made a
> mistake.
>

which branch should i look at for the latest code? i looked at LLL branch.
thanks.

Re: LatentLogLinear code

Posted by Ted Dunning <te...@gmail.com>.

Heh?

On Tue, May 24, 2011 at 5:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> so you do in-memory latent factor computation? I think this is the
> same technique Koren implied for learning latent factors.
>

Are you referring to the Log linear latent factor code that I have in my
mahout-525 github repo?

Or SVD (L-2 latent factor computations)?

Or LDA (multinomial latent factor computation)?

Or Dirichlet process clustering?

Or ...?

> However, i never understood why this factorization must come up with r
> best factors. I understand incremental SVD approach
> (essentially the same thing except learning factors iteratively
> guarantees we capture the best ones) but if we do it all in parallel,
> does it create any good in your trials?
>

I don't understand the question.

Are you asking whether the random projection code finds the best (largest)
singular
values and corresponding vectors?  If so, the answer is yes, it does with
high probability
of low error.

But this doesn't sound like LLL stuff.

> also i thought that cold start problem is helped by the fact that we
> learn weights first so they always give independent best
> approximation, and then user-item interactions reveal specific about
> user and item. However, if we learn them all at the same time, it does
> not seem obvious to me that we'd be learning best approximation when
> latent factors are unkown
> (new users). Also, in that implementation i can't see side info
> training at all -- is it there?
>

This sounds like LLL again.

In LLL, the optimization of side information coefficients and the
optimization of the user-item interactions are separately convex, but not
jointly convex.  This means that you pretty much have to proceed by
optimizing one, then the other, then the first and so on.

So I don't see how we are learning them all at the same time.

You are correct, I think that having lost joint convexity that we don't have
strong convergence guarantees, but practically speaking, we get
convergence.

Regarding the side information, I thought it was there but may have made a
mistake.

Sorry for being dense about your message.