You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ranjith Uthaman <ra...@flytxt.com> on 2012/10/18 07:42:56 UTC

Pseudo-Inverse map reduce implementation

Hi,

Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?

Thanks & Regards,
RANJITH P UTHAMAN

RE: Pseudo-Inverse map reduce implementation

Posted by Ranjith Uthaman <ra...@flytxt.com>.

Many Thanks Owen for the prompt  replies. 
Will update the results on the quality of recommendations here.


-----Original Message-----
From: Sean Owen [mailto:srowen@gmail.com] 
Sent: 18 October 2012 18:01
To: user@mahout.apache.org
Subject: Re: Pseudo-Inverse map reduce implementation

So you have a factorization like A = X * Y' and you are looking for the right inverse of Y' (where Y is the item-feature matrix)?

This is just Y * pinv(Y' * Y). Y' * Y takes a little work to compute, but can be done in one pass over the matrix. Y' * Y is just a
1000x1000 matrix which you can invert in memory quickly. Then it's another multiply. It shouldn't take 40 seconds -- but, it is also something you need not compute at request time every time. It's not going to affect things much to just periodically recompute that if you always want a completely up-to-date right-inverse, because Y won't change rapidly.

Sean


On Thu, Oct 18, 2012 at 1:21 PM, Ranjith Uthaman <ra...@flytxt.com> wrote:
> The final pursuit is building a content-based recommender of the item for each user. User-based and item-based recommenders of mahout as discussed in MahoutInAction book doesn't fare very well considering the data available. Also, a content-based recommender approach is also hinted in the book.
>  Hence, We intend to use linear regression kind-of model for achieving better recommendations. The confidential nature of data does not allow it to be discussed here :-( , but the scale at which this needs to be performed is as follows:
> The number of users are : 5-10 million Number of items are : ~10000 
> [which might increase to million in future] Feature vector of the item 
> is: 1000 [which might increase to 10000 features in future]
>
> We need to find the weight vector using the pseudo inverse of the item matrix and essentially for per user the matrix dimensions is 10000 X 1000. But, since the number of users are large and this needs to be done more frequent.
> On a single desktop machine with 2-core and average configuration  pinv of a matrix of such dimension takes around 40 seconds  .
> This time is too long for customers using mobile web portals whose index page is  completely customised using  the recommendations results obtained above. Not to mention that , rendering of the results to create the page will take further computational time.
>
> Kindly guide.
>
> Thanks & Regards,
> Ranjith
>
>
> -----Original Message-----
> From: Sean Owen [mailto:srowen@gmail.com]
> Sent: 18 October 2012 12:48
> To: user@mahout.apache.org
> Subject: Re: Pseudo-Inverse map reduce implementation
>
> I asked in reply on Quora -- what exactly are you computing? what is the size of input and are you talking about a generalized inverse.
> Depending on this there are easier ways than an SVD.
>
> On Thu, Oct 18, 2012 at 6:42 AM, Ranjith Uthaman <ra...@flytxt.com> wrote:
>> Hi,
>>
>> Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?
>>
>> Thanks & Regards,
>> RANJITH P UTHAMAN

Re: Pseudo-Inverse map reduce implementation

Posted by Sean Owen <sr...@gmail.com>.

So you have a factorization like A = X * Y' and you are looking for
the right inverse of Y' (where Y is the item-feature matrix)?

This is just Y * pinv(Y' * Y). Y' * Y takes a little work to compute,
but can be done in one pass over the matrix. Y' * Y is just a
1000x1000 matrix which you can invert in memory quickly. Then it's
another multiply. It shouldn't take 40 seconds -- but, it is also
something you need not compute at request time every time. It's not
going to affect things much to just periodically recompute that if you
always want a completely up-to-date right-inverse, because Y won't
change rapidly.

Sean


On Thu, Oct 18, 2012 at 1:21 PM, Ranjith Uthaman
<ra...@flytxt.com> wrote:
> The final pursuit is building a content-based recommender of the item for each user. User-based and item-based recommenders of mahout as discussed in MahoutInAction book doesn't fare very well considering the data available. Also, a content-based recommender approach is also hinted in the book.
>  Hence, We intend to use linear regression kind-of model for achieving better recommendations. The confidential nature of data does not allow it to be discussed here :-( , but the scale at which this needs to be performed is as follows:
> The number of users are : 5-10 million
> Number of items are : ~10000 [which might increase to million in future]
> Feature vector of the item is: 1000 [which might increase to 10000 features in future]
>
> We need to find the weight vector using the pseudo inverse of the item matrix and essentially for per user the matrix dimensions is 10000 X 1000. But, since the number of users are large and this needs to be done more frequent.
> On a single desktop machine with 2-core and average configuration  pinv of a matrix of such dimension takes around 40 seconds  .
> This time is too long for customers using mobile web portals whose index page is  completely customised using  the recommendations results obtained above. Not to mention that , rendering of the results to create the page will take further computational time.
>
> Kindly guide.
>
> Thanks & Regards,
> Ranjith
>
>
> -----Original Message-----
> From: Sean Owen [mailto:srowen@gmail.com]
> Sent: 18 October 2012 12:48
> To: user@mahout.apache.org
> Subject: Re: Pseudo-Inverse map reduce implementation
>
> I asked in reply on Quora -- what exactly are you computing? what is the size of input and are you talking about a generalized inverse.
> Depending on this there are easier ways than an SVD.
>
> On Thu, Oct 18, 2012 at 6:42 AM, Ranjith Uthaman <ra...@flytxt.com> wrote:
>> Hi,
>>
>> Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?
>>
>> Thanks & Regards,
>> RANJITH P UTHAMAN

RE: Pseudo-Inverse map reduce implementation

Posted by Ranjith Uthaman <ra...@flytxt.com>.

The final pursuit is building a content-based recommender of the item for each user. User-based and item-based recommenders of mahout as discussed in MahoutInAction book doesn't fare very well considering the data available. Also, a content-based recommender approach is also hinted in the book. 
 Hence, We intend to use linear regression kind-of model for achieving better recommendations. The confidential nature of data does not allow it to be discussed here :-( , but the scale at which this needs to be performed is as follows:
The number of users are : 5-10 million
Number of items are : ~10000 [which might increase to million in future]
Feature vector of the item is: 1000 [which might increase to 10000 features in future]  

We need to find the weight vector using the pseudo inverse of the item matrix and essentially for per user the matrix dimensions is 10000 X 1000. But, since the number of users are large and this needs to be done more frequent.
On a single desktop machine with 2-core and average configuration  pinv of a matrix of such dimension takes around 40 seconds  . 
This time is too long for customers using mobile web portals whose index page is  completely customised using  the recommendations results obtained above. Not to mention that , rendering of the results to create the page will take further computational time.

Kindly guide.

Thanks & Regards,
Ranjith

-----Original Message-----
From: Sean Owen [mailto:srowen@gmail.com] 
Sent: 18 October 2012 12:48
To: user@mahout.apache.org
Subject: Re: Pseudo-Inverse map reduce implementation

I asked in reply on Quora -- what exactly are you computing? what is the size of input and are you talking about a generalized inverse.
Depending on this there are easier ways than an SVD.

On Thu, Oct 18, 2012 at 6:42 AM, Ranjith Uthaman <ra...@flytxt.com> wrote:
> Hi,
>
> Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?
>
> Thanks & Regards,
> RANJITH P UTHAMAN

Re: Pseudo-Inverse map reduce implementation

Posted by Sean Owen <sr...@gmail.com>.

I asked in reply on Quora -- what exactly are you computing? what is
the size of input and are you talking about a generalized inverse.
Depending on this there are easier ways than an SVD.

On Thu, Oct 18, 2012 at 6:42 AM, Ranjith Uthaman
<ra...@flytxt.com> wrote:
> Hi,
>
> Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?
>
> Thanks & Regards,
> RANJITH P UTHAMAN

Re: Pseudo-Inverse map reduce implementation

Posted by Ted Dunning <te...@gmail.com>.

Computing the svd with the stochastic projection is your best bet. 

Sent from my iPhone

On Oct 17, 2012, at 10:42 PM, Ranjith Uthaman <ra...@flytxt.com> wrote:

> Hi,
> 
> Does map reduce implementation of Pseudo-Inverse of a matrix exist in the current Mahout framework? What are the various ways to achieve it?
> 
> Thanks & Regards,
> RANJITH P UTHAMAN