You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Koobas <ko...@gmail.com> on 2013/09/04 19:07:10 UTC

ALS and SVD feature vectors

In ALS the coincidence matrix is approximated by XY',
where X is user-feature, Y is item-feature.
Now, here is the question:
are/should the feature vectors be normalized before computing
recommendations?

Now, what happens in the case of SVD?
The vectors are normal by definition.
Are singular values used at all, or just left and right singular vectors?

Re: ALS and SVD feature vectors

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Sep 4, 2013 at 11:43 AM, Ted Dunning <te...@gmail.com> wrote:
> On Wed, Sep 4, 2013 at 10:59 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> > Now, what happens in the case of SVD?
>> > The vectors are normal by definition.
>> > Are singular values used at all, or just left and right singular vectors?
>>
>> SVD does not take weights so it cannot ignore or weigh out a
>> non-observation, which is why it is not well suited for matrix
>> completion problem per se
>>
>
> There are multiple ways to read the use of weights here.
>
> In the original posting, I think the gist was how to treat the singular
> values, not how to weight different observations.  Mahout's SSVD allows the
> singular values to be kept separate, to be applied entirely to the left or
> right singular values or to be split across both in a square root sort of
> way.

>From solely SSVD point of view, it depends on what is requested. Yes
one could compute three separate outputs, which is the default, or
outputs that bake in singular values to either side, or square roots
of singular values to either side. (I never used the latter myself,
but in papers it is mentioned as a means to build useful similarities
between document and a term in LSA case).

Re: ALS and SVD feature vectors

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Sep 4, 2013 at 10:59 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> > Now, what happens in the case of SVD?
> > The vectors are normal by definition.
> > Are singular values used at all, or just left and right singular vectors?
>
> SVD does not take weights so it cannot ignore or weigh out a
> non-observation, which is why it is not well suited for matrix
> completion problem per se
>

There are multiple ways to read the use of weights here.

In the original posting, I think the gist was how to treat the singular
values, not how to weight different observations.  Mahout's SSVD allows the
singular values to be kept separate, to be applied entirely to the left or
right singular values or to be split across both in a square root sort of
way.

Re: ALS and SVD feature vectors

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Sep 4, 2013 at 11:33 AM, Koobas <ko...@gmail.com> wrote:
> Let me rephrase.
> Suppose I did ALS decomposition of a matrix.
> Suppose I don't want to produce recommendations
>   (by calculating XY').
> Suppose I want to find users with similar preferences
>   (by calculating XX').
> Should the correlation of a user with himself be 1.0?

i take it you meant dot-self, not really correlation (in Pearson
sense). The answer is no, and dot-self is not a [good] measure of
similarity -- for the reasons you've mentioned, if nothing else.
People use cosine similarity or tons of other metrics such as Tanimoto
distances to assess real similarity in the user space and get a more
realistic similarity measure. But I am not sure if Mahout directly
assesses user-user similarities; i think stuff like RowSimilarityJob
is really user-product only.

>
> If the answer is "yes", that means that the user-feature
> vectors in X should be normalized, i.e., scaled to have
> the length of 1.0.
>
> If the answer is "no" then a user can possibly correlate
> stronger with another user than himself.
>
> Which should it be?
> Which one is the case in Mahout?
>
>
> On Wed, Sep 4, 2013 at 1:59 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> On Wed, Sep 4, 2013 at 10:07 AM, Koobas <ko...@gmail.com> wrote:
>> > In ALS the coincidence matrix is approximated by XY',
>> > where X is user-feature, Y is item-feature.
>> > Now, here is the question:
>> > are/should the feature vectors be normalized before computing
>> > recommendations?
>>
>> if it is a coincidence matrix in a sense that there are just 0's and
>> 1's  no it shouldn't (imo). However, if there's a case of
>> no-observations then things are a little bit more complicated (in a
>> sense that preference is still 0 and 1 but there're confidence
>> weights. Determining weights (no-observation weight vs. degree of
>> consumption) is usually advised to be determined via
>> (cross)validation. However at this point Mahout does not support
>> crossvalidation of those parameters, so usually people use some
>> guesswork (see Zhou-Koren-Volinsky paper about implicit feedback
>> datasets).
>> >
>> > Now, what happens in the case of SVD?
>> > The vectors are normal by definition.
>> > Are singular values used at all, or just left and right singular vectors?
>>
>> SVD does not take weights so it cannot ignore or weigh out a
>> non-observation, which is why it is not well suited for matrix
>> completion problem per se.
>>

Re: ALS and SVD feature vectors

Posted by Koobas <ko...@gmail.com>.
Let me rephrase.
Suppose I did ALS decomposition of a matrix.
Suppose I don't want to produce recommendations
  (by calculating XY').
Suppose I want to find users with similar preferences
  (by calculating XX').
Should the correlation of a user with himself be 1.0?

If the answer is "yes", that means that the user-feature
vectors in X should be normalized, i.e., scaled to have
the length of 1.0.

If the answer is "no" then a user can possibly correlate
stronger with another user than himself.

Which should it be?
Which one is the case in Mahout?


On Wed, Sep 4, 2013 at 1:59 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> On Wed, Sep 4, 2013 at 10:07 AM, Koobas <ko...@gmail.com> wrote:
> > In ALS the coincidence matrix is approximated by XY',
> > where X is user-feature, Y is item-feature.
> > Now, here is the question:
> > are/should the feature vectors be normalized before computing
> > recommendations?
>
> if it is a coincidence matrix in a sense that there are just 0's and
> 1's  no it shouldn't (imo). However, if there's a case of
> no-observations then things are a little bit more complicated (in a
> sense that preference is still 0 and 1 but there're confidence
> weights. Determining weights (no-observation weight vs. degree of
> consumption) is usually advised to be determined via
> (cross)validation. However at this point Mahout does not support
> crossvalidation of those parameters, so usually people use some
> guesswork (see Zhou-Koren-Volinsky paper about implicit feedback
> datasets).
> >
> > Now, what happens in the case of SVD?
> > The vectors are normal by definition.
> > Are singular values used at all, or just left and right singular vectors?
>
> SVD does not take weights so it cannot ignore or weigh out a
> non-observation, which is why it is not well suited for matrix
> completion problem per se.
>

Re: ALS and SVD feature vectors

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Sep 4, 2013 at 10:07 AM, Koobas <ko...@gmail.com> wrote:
> In ALS the coincidence matrix is approximated by XY',
> where X is user-feature, Y is item-feature.
> Now, here is the question:
> are/should the feature vectors be normalized before computing
> recommendations?

if it is a coincidence matrix in a sense that there are just 0's and
1's  no it shouldn't (imo). However, if there's a case of
no-observations then things are a little bit more complicated (in a
sense that preference is still 0 and 1 but there're confidence
weights. Determining weights (no-observation weight vs. degree of
consumption) is usually advised to be determined via
(cross)validation. However at this point Mahout does not support
crossvalidation of those parameters, so usually people use some
guesswork (see Zhou-Koren-Volinsky paper about implicit feedback
datasets).
>
> Now, what happens in the case of SVD?
> The vectors are normal by definition.
> Are singular values used at all, or just left and right singular vectors?

SVD does not take weights so it cannot ignore or weigh out a
non-observation, which is why it is not well suited for matrix
completion problem per se.

Re: ALS and SVD feature vectors

Posted by Koobas <ko...@gmail.com>.
On Wed, Sep 4, 2013 at 3:06 PM, Sean Owen <sr...@gmail.com> wrote:

> The feature vectors? rows of X and Y? no, they definitely should not be
> normalized. It will change the approximation you so carefully built quite a
> lot.
>
> As you say U and V are orthornormal in the SVD. But you still multiply all
> of them together with Sigma when making recs. (Or you embed Sigma in U and
> V.)  So yes the singular values are used; they give proper weights to
> features.
>
> You can think of X and Y as being like that, with Sigma mixed in in some
> arbitrary way. Normalizing it would not be valid.
>
> Excellent!
Straight to the point.
That's the answer I was looking for.
Also, thanks to Ted. He pretty much said the same thing.

>
> On Wed, Sep 4, 2013 at 6:07 PM, Koobas <ko...@gmail.com> wrote:
>
> > In ALS the coincidence matrix is approximated by XY',
> > where X is user-feature, Y is item-feature.
> > Now, here is the question:
> > are/should the feature vectors be normalized before computing
> > recommendations?
> >
> > Now, what happens in the case of SVD?
> > The vectors are normal by definition.
> > Are singular values used at all, or just left and right singular vectors?
> >
>

Re: ALS and SVD feature vectors

Posted by Sean Owen <sr...@gmail.com>.
The feature vectors? rows of X and Y? no, they definitely should not be
normalized. It will change the approximation you so carefully built quite a
lot.

As you say U and V are orthornormal in the SVD. But you still multiply all
of them together with Sigma when making recs. (Or you embed Sigma in U and
V.)  So yes the singular values are used; they give proper weights to
features.

You can think of X and Y as being like that, with Sigma mixed in in some
arbitrary way. Normalizing it would not be valid.


On Wed, Sep 4, 2013 at 6:07 PM, Koobas <ko...@gmail.com> wrote:

> In ALS the coincidence matrix is approximated by XY',
> where X is user-feature, Y is item-feature.
> Now, here is the question:
> are/should the feature vectors be normalized before computing
> recommendations?
>
> Now, what happens in the case of SVD?
> The vectors are normal by definition.
> Are singular values used at all, or just left and right singular vectors?
>