You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Chloe <ch...@gmail.com> on 2013/04/10 22:29:36 UTC

Fold-in for ALSWR

Hi everyone,

I am reaching out to the list requesting some help/advice on implementing 
fold-in with the Alternating Least Squares algo in Mahout, a problem on 
which I am stumped. I've read other posts on the list and over on SO, like:

http://stackoverflow.com/questions/12444231/svd-matrix-conditioning-
how-to-project-from-original-space-to-conditioned-spac
and
http://stackoverflow.com/questions/12857693/mahout-how-to-make-
recommendations-for-new-users
including the slide show talk posted in the last link. This slide show seems
to be the most relevant to my purposes, specifically slide 14, but I can't fully 
understand it.

At the end of ALS I have Ratings = A = UM', where U is user feature matrix 
and M is item feature matrix.
For an existing user with a new rating, I plan to update the Auser only,which
is (1 x number of items) by replacing/adding new rating at the appropriate
index and then do:
Uuser = Auser * V'   ( 1 x items)(items x features)
Sub Uuser back into U and get new recommendations

How do I address a NEW user? From the slide:
X = A (Y')^-1
or 
X = A Y ((Y'Y) ^-1)

Is the 2nd preferable? If so, why? I would appreciate a more 'wordy' 
explanation or someone pointing me to any papers I can read on this. Also,
as a slight offshoot, given that mahout doesn't have matrix inversion, is it
more advisable to use QR decomp to find a pseudoinverse or an outside 
package to do inversion?

Apologies for the long post which I did try and make concise, and thank 
you for any help you offer.

Chloe

Re: Fold-in for ALSWR

Posted by Sean Owen <sr...@gmail.com>.

I should say that it depends of course on what you are implementing.
You can also write an algorithm to factor R, not P. If you're doing
that, then I would not expect values to be so low. But I thought you
were following the version where you factor P = R != 0.

Multiplying by 3 and adding 1 would not do what you want, no.

On Tue, Apr 30, 2013 at 2:24 PM, Chloe <ch...@gmail.com> wrote:
> Thanks again for replying.
>
> I didn't expect that since I'm using explicit feedback, not implicit, but
> mostly because the part files in U/ and V/ multiplied together give me back
> predicted ratings on a 1-4 scale.
>
> Would converting the 0/1 connection indicator to a 1-4 scale be any sort of
> reasonable on capturing the strength of the connection or is that entirely
> unjustified?
>
> -Chloe
>

Re: Fold-in for ALSWR

Posted by Chloe <ch...@gmail.com>.

Thanks again for replying.

I didn't expect that since I'm using explicit feedback, not implicit, but
mostly because the part files in U/ and V/ multiplied together give me back
predicted ratings on a 1-4 scale. 

Would converting the 0/1 connection indicator to a 1-4 scale be any sort of
reasonable on capturing the strength of the connection or is that entirely
unjustified?  

-Chloe

Re: Fold-in for ALSWR

Posted by Sean Owen <sr...@gmail.com>.

ALS-WR is not predicting your input matrix R, but the matrix P which
is R != 0. It is not predicting ratings, but a 0/1 indicator of
whether the connection exists. So the values are usually in [0,1].

On Tue, Apr 30, 2013 at 2:40 AM, Chloe <ch...@gmail.com> wrote:
> Dear Sean,
>
> Thanks a lot for a quick and helpful reply. Having been sidetracked with
> another project, I revisited the problem I posed in my post over the weekend
> and, unfortunately, have a follow up question.
>
> The problem I'm facing with my implementation of your explanation is that
> the predicted ratings for new users seem to be on a very different scale
> than the original ratings the model is based on and I'm wondering what I've
> done wrong.
>
> To recap my steps in pseudocode:
> 1. Use a text file of ratings on 1-4 scale to generate my model afterward
> given by files U/part-m-00000 and V/part-m-00000, or Ratings = UV'.
>
> 2. Vector newRatings = new Vector(); ex. given 10 items a new user's ratings
> looks like {0,1,0,3,4,0,2,3,0,1}
>     Matrix Au = new DenseMatrix(newRatings.size(), 1);
>     Au.assignColumn(0, newRatings);
>     QRDecomposition qr = new QRDecomposition(V);  //item features
>     Matrix Xu = qr.solve(Au);
>     Matrix predictedUserRatingsForAllItems =
> (Xu.transpose()).times(V.transpose());
>
> 3. DenseVector predictedUserRatingsVector =
> (DenseVector)predictedUserRatingsForAllItems.viewRow(0);
>
> The "predictedUserRatingsVector" from step 3, however, gives a top 10 item
> result with scores ranging from 0.46-0.62. These numbers go up with the
> number of new items rated. Which means that even for item 5, given the
> highest possible score of 4, this approach can't even give back a rating for
> a rated item close to its actual value.
> Moreover, the new user's ratings I test, {0,1,0,3,4,0,2,3,0,1}, are actually
> identical to an existing user that was used to build the model and whose
> predicted ratings are very reasonable, looking like
> {0.5,0.98,0.89,3.23,4.1,1.01,2.32,2.99,3.5,1.1}.
>
> I must be doing something wrong or missing something. Is there anything you
> or anyone from the list with fold-in experience can suggest I try or
> consider that would explain why this is happening? I expected that predicted
> ratings from fold-in would not be as good as regenerating the model, but not
> this bad.
>
> Many thanks,
> Chloe
>
>
>
>

Re: Fold-in for ALSWR

Posted by Chloe <ch...@gmail.com>.

Dear Sean,

Thanks a lot for a quick and helpful reply. Having been sidetracked with
another project, I revisited the problem I posed in my post over the weekend
and, unfortunately, have a follow up question.

The problem I'm facing with my implementation of your explanation is that
the predicted ratings for new users seem to be on a very different scale
than the original ratings the model is based on and I'm wondering what I've
done wrong.

To recap my steps in pseudocode:
1. Use a text file of ratings on 1-4 scale to generate my model afterward
given by files U/part-m-00000 and V/part-m-00000, or Ratings = UV'.

2. Vector newRatings = new Vector(); ex. given 10 items a new user's ratings
looks like {0,1,0,3,4,0,2,3,0,1}
    Matrix Au = new DenseMatrix(newRatings.size(), 1);
    Au.assignColumn(0, newRatings);  
    QRDecomposition qr = new QRDecomposition(V);  //item features
    Matrix Xu = qr.solve(Au);
    Matrix predictedUserRatingsForAllItems =
(Xu.transpose()).times(V.transpose()); 
   
3. DenseVector predictedUserRatingsVector =
(DenseVector)predictedUserRatingsForAllItems.viewRow(0);

The "predictedUserRatingsVector" from step 3, however, gives a top 10 item
result with scores ranging from 0.46-0.62. These numbers go up with the
number of new items rated. Which means that even for item 5, given the
highest possible score of 4, this approach can't even give back a rating for
a rated item close to its actual value.
Moreover, the new user's ratings I test, {0,1,0,3,4,0,2,3,0,1}, are actually
identical to an existing user that was used to build the model and whose
predicted ratings are very reasonable, looking like
{0.5,0.98,0.89,3.23,4.1,1.01,2.32,2.99,3.5,1.1}.

I must be doing something wrong or missing something. Is there anything you
or anyone from the list with fold-in experience can suggest I try or
consider that would explain why this is happening? I expected that predicted
ratings from fold-in would not be as good as regenerating the model, but not
this bad.

Many thanks,
Chloe

Re: Fold-in for ALSWR

Posted by Sean Owen <sr...@gmail.com>.

For simplicity let's consider a brand-new user first, not a new rating
for existing user. I'll use the notation from my slides that you
mention, A = X * Y'. To clarify, I think you mean you have a new A_u
row, and want to know X_u.

The two expressions are not alternatives, they're the same thing,
though abusing notation a little. In the first that's not a real
inverse since Y' is not even square. It needs to be something that
works as a right-inverse. The second elaborates what that is -- that
is an inverse, of a square matrix.

As you might have seen on this list I actually investigated this part
a lot last week since I thought there was a numerical problem. There
is a gotcha here, but it's not to do with inverses. Yes you can use a
QR decomposition to find the inverse. It's fast. There is a QR
decomposition in the project, or, I've had good luck with Commons
Math. (In fact 3.2 has a proper rank-revealing QR decomposition now,
which is useful for other reasons in this context.)

I ended up not actually computing the inverse, and instead using a QR
decomposition to solve A_u = X_u * Y' for X_u. It's nearly exactly the
same thing, but more principled.

Back to handling a new event for an existing user -- it gets trickier
here since you don't want to fold-in A_u = [ .. 0 0 1 0 0 .. ] -- it's
too big, especially if A_u already has a value near 1 in that spot.
Note that you also want to consider updating Y similarly; you've
learned about the user but you also learned about the item. Same
thing, just for the other matrix.

Sean

On Wed, Apr 10, 2013 at 9:29 PM, Chloe <ch...@gmail.com> wrote:
> Hi everyone,
>
> I am reaching out to the list requesting some help/advice on implementing
> fold-in with the Alternating Least Squares algo in Mahout, a problem on
> which I am stumped. I've read other posts on the list and over on SO, like:
>
> http://stackoverflow.com/questions/12444231/svd-matrix-conditioning-
> how-to-project-from-original-space-to-conditioned-spac
> and
> http://stackoverflow.com/questions/12857693/mahout-how-to-make-
> recommendations-for-new-users
> including the slide show talk posted in the last link. This slide show seems
> to be the most relevant to my purposes, specifically slide 14, but I can't fully
> understand it.
>
> At the end of ALS I have Ratings = A = UM', where U is user feature matrix
> and M is item feature matrix.
> For an existing user with a new rating, I plan to update the Auser only,which
> is (1 x number of items) by replacing/adding new rating at the appropriate
> index and then do:
> Uuser = Auser * V'   ( 1 x items)(items x features)
> Sub Uuser back into U and get new recommendations
>
> How do I address a NEW user? From the slide:
> X = A (Y')^-1
> or
> X = A Y ((Y'Y) ^-1)
>
> Is the 2nd preferable? If so, why? I would appreciate a more 'wordy'
> explanation or someone pointing me to any papers I can read on this. Also,
> as a slight offshoot, given that mahout doesn't have matrix inversion, is it
> more advisable to use QR decomp to find a pseudoinverse or an outside
> package to do inversion?
>
> Apologies for the long post which I did try and make concise, and thank
> you for any help you offer.
>
> Chloe
>
>