You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2011/11/07 00:53:38 UTC

Factorizing SVD (online SVDRecommender)

This thread is about the class SVDRecommender, which uses an externally
created factorization to do recommendation.

A: The Factorization classes do not extract the scaling diagonal matrix. Is
this embedded in the left or right matrix? Or spread across both?
B: Is there an explanation of why the dot product of two feature vectors
creates the preference? A paper or blog post? Or a paragraph in a reply?

-- 
Lance Norskog
goksron@gmail.com

Re: Factorizing SVD (online SVDRecommender)

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
I was kind of also dubious about that name. Those algorithms are
factorizations alright but they don't technically do exactly what
mathematical SVD definition does (even with understanding that in practice
mathematically strict svd contract may be relaxed). Even that some
published algorithms try to derive their names from SVD to emohasize the
strong connection (SVD++ comes to mind) they are still very cautious to
emphasize they are not producing actual mathematical SVD.
On Nov 8, 2011 4:03 PM, "Lance Norskog" <go...@gmail.com> wrote:

> Thank you. Maybe it should be called FactorizedRecommender instead, since
> there is no real SVD?
>
> On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
>
> > I think its also important to note that in the recommendation world most
> > of the time something which only looks like an SVD is used.
> >
> > In most recommendation usecases, only a tiny fraction of the
> > user-item-matrix is observed. Therefore "standard" algorithms for
> > obtaining the SVD cannot be used (at least that's what the papers say).
> >
> > The usual approach is to decompose the rating matrix A (users x items)
> > into the product of two other matrices X (users x features) and Y (items
> > x features). A is approximated by XY'.
> >
> > This decomposition is obtained by selecting X and Y in a way that
> > minimizes the overall squared error in regard to the observed ratings
> > (plus some regularization to avoid overfitting).
> >
> > For all observed user-item pairs (u,i) the squared error of the
> > prediction is obtained by multiplying the corresponding user and item
> > vector from the latent feature space:
> >
> > f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term
> >
> > There are several ways to find the X and Y that minimize this function.
> >
> > One approach is to use Stochastic Gradient Descent. This approach was
> > made by popular by Simon Funk's famous article during the netflix prize:
> >
> > "Netflix Update: Try this at home"
> > http://sifter.org/~simon/journal/20061211.html
> >
> > I think that's also what's implemented in our
> > ExpectationMaximizationSVDFactorizer.
> >
> > Other approaches use a technique called "Alternating Least Squares"
> > where you iteratively fix either X or Y and solve the equation for the
> > other.
> >
> > This approach is described in: "Large-scale Parallel Collaborative
> > Filtering for the Netflix Prize",
> >
> >
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
> >
> > Our sequential ALSWRFactorizer is an implementation of this as well as
> > the recently refactored ParallelALSFactorizationJob which takes this
> > algorithm onto hadoop.
> >
> > There is a second very interesting paper that shows how to use the ALS
> > approach with implicit feedback data: "Collaborative Filtering for
> > Implicit Feedback Datasets", http://research.yahoo.com/pub/2433
> >
> > I think Tamas's sequential factorizer in MAHOUT-737 is a direct
> > implementation of this and I also plan to include it in
> > ParallelALSFactorizationJob.
> >
> > --sebastian
> >
> >
> > On 07.11.2011 07:10, Sean Owen wrote:
> > > It's embedded in both as far as I can tell, though I don't know enough
> > > about the implementation to say what the split is. Ted remarked once
> > > that it's usual to split sqrt(S) across both.
> > >
> > > Dotting two vectors to get one rating is really just the process of
> > > matrix multiplication to recover a value from the approximate
> > > user-item matrix. A = U S V', and we have some truncated versions of
> > > those right matrices. Uk Sk V'k gives Ak, which is some approximation
> > > of the original A (the input) but with many new values filled in from
> > > which to make recommendations. The Uk and V'k actually already have Sk
> > > embedded. So to get one value in Ak is just a dot of a row of Uk with
> > > a column of V'k, per usual matrix multiplication.
> > >
> > > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <go...@gmail.com>
> > wrote:
> > >> This thread is about the class SVDRecommender, which uses an
> externally
> > >> created factorization to do recommendation.
> > >>
> > >> A: The Factorization classes do not extract the scaling diagonal
> > matrix. Is
> > >> this embedded in the left or right matrix? Or spread across both?
> > >> B: Is there an explanation of why the dot product of two feature
> vectors
> > >> creates the preference? A paper or blog post? Or a paragraph in a
> reply?
> > >>
> > >> --
> > >> Lance Norskog
> > >> goksron@gmail.com
> > >>
> >
> >
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Factorizing SVD (online SVDRecommender)

Posted by Lance Norskog <go...@gmail.com>.
Thank you. Maybe it should be called FactorizedRecommender instead, since
there is no real SVD?

On Sun, Nov 6, 2011 at 11:58 PM, Sebastian Schelter <ss...@apache.org> wrote:

> I think its also important to note that in the recommendation world most
> of the time something which only looks like an SVD is used.
>
> In most recommendation usecases, only a tiny fraction of the
> user-item-matrix is observed. Therefore "standard" algorithms for
> obtaining the SVD cannot be used (at least that's what the papers say).
>
> The usual approach is to decompose the rating matrix A (users x items)
> into the product of two other matrices X (users x features) and Y (items
> x features). A is approximated by XY'.
>
> This decomposition is obtained by selecting X and Y in a way that
> minimizes the overall squared error in regard to the observed ratings
> (plus some regularization to avoid overfitting).
>
> For all observed user-item pairs (u,i) the squared error of the
> prediction is obtained by multiplying the corresponding user and item
> vector from the latent feature space:
>
> f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term
>
> There are several ways to find the X and Y that minimize this function.
>
> One approach is to use Stochastic Gradient Descent. This approach was
> made by popular by Simon Funk's famous article during the netflix prize:
>
> "Netflix Update: Try this at home"
> http://sifter.org/~simon/journal/20061211.html
>
> I think that's also what's implemented in our
> ExpectationMaximizationSVDFactorizer.
>
> Other approaches use a technique called "Alternating Least Squares"
> where you iteratively fix either X or Y and solve the equation for the
> other.
>
> This approach is described in: "Large-scale Parallel Collaborative
> Filtering for the Netflix Prize",
>
> http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf
>
> Our sequential ALSWRFactorizer is an implementation of this as well as
> the recently refactored ParallelALSFactorizationJob which takes this
> algorithm onto hadoop.
>
> There is a second very interesting paper that shows how to use the ALS
> approach with implicit feedback data: "Collaborative Filtering for
> Implicit Feedback Datasets", http://research.yahoo.com/pub/2433
>
> I think Tamas's sequential factorizer in MAHOUT-737 is a direct
> implementation of this and I also plan to include it in
> ParallelALSFactorizationJob.
>
> --sebastian
>
>
> On 07.11.2011 07:10, Sean Owen wrote:
> > It's embedded in both as far as I can tell, though I don't know enough
> > about the implementation to say what the split is. Ted remarked once
> > that it's usual to split sqrt(S) across both.
> >
> > Dotting two vectors to get one rating is really just the process of
> > matrix multiplication to recover a value from the approximate
> > user-item matrix. A = U S V', and we have some truncated versions of
> > those right matrices. Uk Sk V'k gives Ak, which is some approximation
> > of the original A (the input) but with many new values filled in from
> > which to make recommendations. The Uk and V'k actually already have Sk
> > embedded. So to get one value in Ak is just a dot of a row of Uk with
> > a column of V'k, per usual matrix multiplication.
> >
> > On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <go...@gmail.com>
> wrote:
> >> This thread is about the class SVDRecommender, which uses an externally
> >> created factorization to do recommendation.
> >>
> >> A: The Factorization classes do not extract the scaling diagonal
> matrix. Is
> >> this embedded in the left or right matrix? Or spread across both?
> >> B: Is there an explanation of why the dot product of two feature vectors
> >> creates the preference? A paper or blog post? Or a paragraph in a reply?
> >>
> >> --
> >> Lance Norskog
> >> goksron@gmail.com
> >>
>
>


-- 
Lance Norskog
goksron@gmail.com

Re: Factorizing SVD (online SVDRecommender)

Posted by Sebastian Schelter <ss...@apache.org>.
I think its also important to note that in the recommendation world most
of the time something which only looks like an SVD is used.

In most recommendation usecases, only a tiny fraction of the
user-item-matrix is observed. Therefore "standard" algorithms for
obtaining the SVD cannot be used (at least that's what the papers say).

The usual approach is to decompose the rating matrix A (users x items)
into the product of two other matrices X (users x features) and Y (items
x features). A is approximated by XY'.

This decomposition is obtained by selecting X and Y in a way that
minimizes the overall squared error in regard to the observed ratings
(plus some regularization to avoid overfitting).

For all observed user-item pairs (u,i) the squared error of the
prediction is obtained by multiplying the corresponding user and item
vector from the latent feature space:

f(X,Y) = sum_{u,i} (r_{u,i} - x_u' y_i)^2 + some regularization term

There are several ways to find the X and Y that minimize this function.

One approach is to use Stochastic Gradient Descent. This approach was
made by popular by Simon Funk's famous article during the netflix prize:

"Netflix Update: Try this at home"
http://sifter.org/~simon/journal/20061211.html

I think that's also what's implemented in our
ExpectationMaximizationSVDFactorizer.

Other approaches use a technique called "Alternating Least Squares"
where you iteratively fix either X or Y and solve the equation for the
other.

This approach is described in: "Large-scale Parallel Collaborative
Filtering for the Netflix Prize",
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf

Our sequential ALSWRFactorizer is an implementation of this as well as
the recently refactored ParallelALSFactorizationJob which takes this
algorithm onto hadoop.

There is a second very interesting paper that shows how to use the ALS
approach with implicit feedback data: "Collaborative Filtering for
Implicit Feedback Datasets", http://research.yahoo.com/pub/2433

I think Tamas's sequential factorizer in MAHOUT-737 is a direct
implementation of this and I also plan to include it in
ParallelALSFactorizationJob.

--sebastian


On 07.11.2011 07:10, Sean Owen wrote:
> It's embedded in both as far as I can tell, though I don't know enough
> about the implementation to say what the split is. Ted remarked once
> that it's usual to split sqrt(S) across both.
> 
> Dotting two vectors to get one rating is really just the process of
> matrix multiplication to recover a value from the approximate
> user-item matrix. A = U S V', and we have some truncated versions of
> those right matrices. Uk Sk V'k gives Ak, which is some approximation
> of the original A (the input) but with many new values filled in from
> which to make recommendations. The Uk and V'k actually already have Sk
> embedded. So to get one value in Ak is just a dot of a row of Uk with
> a column of V'k, per usual matrix multiplication.
> 
> On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <go...@gmail.com> wrote:
>> This thread is about the class SVDRecommender, which uses an externally
>> created factorization to do recommendation.
>>
>> A: The Factorization classes do not extract the scaling diagonal matrix. Is
>> this embedded in the left or right matrix? Or spread across both?
>> B: Is there an explanation of why the dot product of two feature vectors
>> creates the preference? A paper or blog post? Or a paragraph in a reply?
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>


Re: Factorizing SVD (online SVDRecommender)

Posted by Sean Owen <sr...@gmail.com>.
It's embedded in both as far as I can tell, though I don't know enough
about the implementation to say what the split is. Ted remarked once
that it's usual to split sqrt(S) across both.

Dotting two vectors to get one rating is really just the process of
matrix multiplication to recover a value from the approximate
user-item matrix. A = U S V', and we have some truncated versions of
those right matrices. Uk Sk V'k gives Ak, which is some approximation
of the original A (the input) but with many new values filled in from
which to make recommendations. The Uk and V'k actually already have Sk
embedded. So to get one value in Ak is just a dot of a row of Uk with
a column of V'k, per usual matrix multiplication.

On Sun, Nov 6, 2011 at 11:53 PM, Lance Norskog <go...@gmail.com> wrote:
> This thread is about the class SVDRecommender, which uses an externally
> created factorization to do recommendation.
>
> A: The Factorization classes do not extract the scaling diagonal matrix. Is
> this embedded in the left or right matrix? Or spread across both?
> B: Is there an explanation of why the dot product of two feature vectors
> creates the preference? A paper or blog post? Or a paragraph in a reply?
>
> --
> Lance Norskog
> goksron@gmail.com
>