You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2010/11/23 08:24:58 UTC

Matrix-based recommendation analysis

The GroupLens dataset has User, Item, Rating and Timestamp.
We will use the rating of 1-5 as-is, but will reduce the timestamp
field to day of the week.
The lack of a rating defaults two 3 (neutral). There are 5 ratings
total in the sample:

U1, I1, 2, ?
U1, I3, 4, ?
U2, I1, 4, ?
U2, I2, 5, T
U2, I3, 3, ?

(We'll get to the question marks later.)
Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
User v.s. Item contains ratings, and Item v.s. Day of the Week
contains the number of rating records for that item on that day of the
week: ratings only cover Sunday, Monday and Tuesday.

Formatting tables in kerned fonts just plain doesn't work, thus the
alternate format.

2 Users v.s. 3 Items:
I1,I2,I3
{
U1  {2,3,4}
U2  {4,5,3}
 }

3 Items v.s. 7 Days of the Week
S,M,T,W,T,F,S
{
I1 {1,0,1,0,0,0,0}
I2 {0,0,1,0,0,0,0}
I3 {0,1,1,0,0,0,0}
}

Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
of the Week:
S,M,T,W,T,F,S
{
U1 {2,4,9,0,0,0,0}
U2 {4,3,12,0,0,0,0}
}

This matrix carries the total amount of enthusiasm for each user on
each day. To get the average enthusiasm of each user, divide each row
by the total number of ratings per day:
S,M,T,W,T,F,S
{
U1 {2,4,3,0,0,0,0}
U2 {4,3,4,0,0,0,0}
}

Did I get this right, Ted?

BTW, where are your slides for this topic? I've seen them a couple of
times in presentations (live and on Fora.tv), but can't find them.

-- 
Lance Norskog
lance.norskog@gmail.com

Re: Matrix-based recommendation analysis

Posted by Ted Dunning <te...@gmail.com>.

Check here: http://www.slideshare.net/tdunning/newsfeed

On Mon, Nov 22, 2010 at 11:24 PM, Lance Norskog <go...@gmail.com> wrote:

> ...

BTW, where are your slides for this topic? I've seen them a couple of
> times in presentations (live and on Fora.tv), but can't find them.
>
>

Re: Matrix-based recommendation analysis

Posted by Lance Norskog <go...@gmail.com>.

About the missing days of the week: I failed to record the days of the
week for the other four ratings.

On Mon, Nov 22, 2010 at 11:24 PM, Lance Norskog <go...@gmail.com> wrote:
> The GroupLens dataset has User, Item, Rating and Timestamp.
> We will use the rating of 1-5 as-is, but will reduce the timestamp
> field to day of the week.
> The lack of a rating defaults two 3 (neutral). There are 5 ratings
> total in the sample:
>
> U1, I1, 2, ?
> U1, I3, 4, ?
> U2, I1, 4, ?
> U2, I2, 5, T
> U2, I3, 3, ?
>
> (We'll get to the question marks later.)
> Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
> User v.s. Item contains ratings, and Item v.s. Day of the Week
> contains the number of rating records for that item on that day of the
> week: ratings only cover Sunday, Monday and Tuesday.
>
> Formatting tables in kerned fonts just plain doesn't work, thus the
> alternate format.
>
> 2 Users v.s. 3 Items:
> I1,I2,I3
> {
> U1  {2,3,4}
> U2  {4,5,3}
>  }
>
> 3 Items v.s. 7 Days of the Week
> S,M,T,W,T,F,S
> {
> I1 {1,0,1,0,0,0,0}
> I2 {0,0,1,0,0,0,0}
> I3 {0,1,1,0,0,0,0}
> }
>
> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
> of the Week:
> S,M,T,W,T,F,S
> {
> U1 {2,4,9,0,0,0,0}
> U2 {4,3,12,0,0,0,0}
> }
>
> This matrix carries the total amount of enthusiasm for each user on
> each day. To get the average enthusiasm of each user, divide each row
> by the total number of ratings per day:
> S,M,T,W,T,F,S
> {
> U1 {2,4,3,0,0,0,0}
> U2 {4,3,4,0,0,0,0}
> }
>
> Did I get this right, Ted?
>
> BTW, where are your slides for this topic? I've seen them a couple of
> times in presentations (live and on Fora.tv), but can't find them.
>
> --
> Lance Norskog
> lance.norskog@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: Matrix-based recommendation analysis

Posted by Ted Dunning <te...@gmail.com>.

Not necessarily.  My own implementations have done a large off-line
computation and then done the final bit on-line.

On Thu, Nov 25, 2010 at 11:24 AM, Dinesh B Vadhia <dineshbvadhia@hotmail.com
> wrote:

> Thanks!  Sorry, for ii) I meant:  Are the predictions calculated offline?
>
>
>
> From: Ted Dunning
> Sent: Thursday, November 25, 2010 11:17 AM
> To: user@mahout.apache.org
> Subject: Re: Matrix-based recommendation analysis
>
>
> On Thu, Nov 25, 2010 at 2:12 AM, Dinesh B Vadhia
> <di...@hotmail.com>wrote:
>
> > Hello!  Have looked at the presentation and trying to get my head around
> > it:
> >
> > i.  is collaborative filtering being bypassed?
> >
>
> No.  This is just a new form of collaborative algorithm.
>
>
> > ii.  are new entries (observations) added dynamically or as a batch
> > process?
> >
>
> At the very high level of the talk, this is not specified.  You can do it
> either way.
>

Re: Matrix-based recommendation analysis

Posted by Dinesh B Vadhia <di...@hotmail.com>.

Thanks!  Sorry, for ii) I meant:  Are the predictions calculated offline?

From: Ted Dunning 
Sent: Thursday, November 25, 2010 11:17 AM
To: user@mahout.apache.org 
Subject: Re: Matrix-based recommendation analysis

On Thu, Nov 25, 2010 at 2:12 AM, Dinesh B Vadhia
<di...@hotmail.com>wrote:

> Hello!  Have looked at the presentation and trying to get my head around
> it:
>
> i.  is collaborative filtering being bypassed?
>

No.  This is just a new form of collaborative algorithm.

> ii.  are new entries (observations) added dynamically or as a batch
> process?
>

At the very high level of the talk, this is not specified.  You can do it
either way.

Re: Matrix-based recommendation analysis

Posted by Ted Dunning <te...@gmail.com>.

On Thu, Nov 25, 2010 at 2:12 AM, Dinesh B Vadhia
<di...@hotmail.com>wrote:

> Hello!  Have looked at the presentation and trying to get my head around
> it:
>
> i.  is collaborative filtering being bypassed?
>

No.  This is just a new form of collaborative algorithm.

> ii.  are new entries (observations) added dynamically or as a batch
> process?
>

At the very high level of the talk, this is not specified.  You can do it
either way.

Re: Matrix-based recommendation analysis

Posted by Dinesh B Vadhia <di...@hotmail.com>.

Hello!  Have looked at the presentation and trying to get my head around it:

i.  is collaborative filtering being bypassed?
ii.  are new entries (observations) added dynamically or as a batch process?

 


From: Ted Dunning 
Sent: Tuesday, November 23, 2010 8:10 AM
To: user@mahout.apache.org 
Subject: Re: Matrix-based recommendation analysis


For cross recommender comprehension, I recommend something like the example
in my slide show.  In that example, users issued query terms (giving the u x
q matrix B) and they watched videos (giving the u x v matrix A).  The cross
recommendation is a smoothed version of A' B (which result is v x q).

This matrix could be used to take query terms and recommend videos.  That is
(A' B) q = v.  With suitable cleanup of the A'B to suppress spurious
entries, this makes a workable search engine.

Bringing a concept like days of the week into the mix is a bit confusing.
 That could give you a smoothed popularity of content per day of the week,
but that is normally done by much simpler means.  The biggest difference is
that you can't pick three days of the week, but you can put three terms into
a query.

On Tue, Nov 23, 2010 at 12:15 AM, Lance Norskog <go...@gmail.com> wrote:

> I'm not trying to distinguish them. That is the "find the Netflix user"
> paper :)
>
> I just want to understand the cross-recommender concept, that's all.
> Yes, this sample is too small to impute "enthusiasm"- the numbers are
> recommendation values.
>
> (If the rest of you want to follow along:
> http://www.slideshare.net/tdunning/intelligent-search , slides 35-36)
>
> On Mon, Nov 22, 2010 at 11:50 PM, Sean Owen <sr...@gmail.com> wrote:
> > (PS I don't think that link from Ted is publicly visible but try
> > http://www.slideshare.net/tdunning )
> >
> > Maybe I'm walking into half of a another conversation but what's the
> > question or goal here?
> >
> > I don't think the matrix product contains quite what you're saying.
> > For example U1 records only 2 ratings but has some "enthusiasm" on 3
> > separate days in the matrix product. The product is mashing together
> > item-day associations from all users and applying them to each user.
> >
> > Conceptually user-item-day is the 3-dimensional matrix that it sounds
> > like, if you want to distinguish associations from different users to
> > different items on different days.
> >
> >
> > On Tue, Nov 23, 2010 at 7:24 AM, Lance Norskog <go...@gmail.com>
> wrote:
> >> The GroupLens dataset has User, Item, Rating and Timestamp.
> >> We will use the rating of 1-5 as-is, but will reduce the timestamp
> >> field to day of the week.
> >> The lack of a rating defaults two 3 (neutral). There are 5 ratings
> >> total in the sample:
> >>
> >> U1, I1, 2, ?
> >> U1, I3, 4, ?
> >> U2, I1, 4, ?
> >> U2, I2, 5, T
> >> U2, I3, 3, ?
> >>
> >> (We'll get to the question marks later.)
> >> Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
> >> User v.s. Item contains ratings, and Item v.s. Day of the Week
> >> contains the number of rating records for that item on that day of the
> >> week: ratings only cover Sunday, Monday and Tuesday.
> >>
> >> Formatting tables in kerned fonts just plain doesn't work, thus the
> >> alternate format.
> >>
> >> 2 Users v.s. 3 Items:
> >> I1,I2,I3
> >> {
> >> U1  {2,3,4}
> >> U2  {4,5,3}
> >>  }
> >>
> >> 3 Items v.s. 7 Days of the Week
> >> S,M,T,W,T,F,S
> >> {
> >> I1 {1,0,1,0,0,0,0}
> >> I2 {0,0,1,0,0,0,0}
> >> I3 {0,1,1,0,0,0,0}
> >> }
> >>
> >> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
> >> of the Week:
> >> S,M,T,W,T,F,S
> >> {
> >> U1 {2,4,9,0,0,0,0}
> >> U2 {4,3,12,0,0,0,0}
> >> }
> >>
> >> This matrix carries the total amount of enthusiasm for each user on
> >> each day. To get the average enthusiasm of each user, divide each row
> >> by the total number of ratings per day:
> >> S,M,T,W,T,F,S
> >> {
> >> U1 {2,4,3,0,0,0,0}
> >> U2 {4,3,4,0,0,0,0}
> >> }
> >>
> >> Did I get this right, Ted?
> >>
> >> BTW, where are your slides for this topic? I've seen them a couple of
> >> times in presentations (live and on Fora.tv), but can't find them.
> >>
> >> --
> >> Lance Norskog
> >> lance.norskog@gmail.com
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Matrix-based recommendation analysis

Posted by Ted Dunning <te...@gmail.com>.

For cross recommender comprehension, I recommend something like the example
in my slide show.  In that example, users issued query terms (giving the u x
q matrix B) and they watched videos (giving the u x v matrix A).  The cross
recommendation is a smoothed version of A' B (which result is v x q).

This matrix could be used to take query terms and recommend videos.  That is
(A' B) q = v.  With suitable cleanup of the A'B to suppress spurious
entries, this makes a workable search engine.

Bringing a concept like days of the week into the mix is a bit confusing.
 That could give you a smoothed popularity of content per day of the week,
but that is normally done by much simpler means.  The biggest difference is
that you can't pick three days of the week, but you can put three terms into
a query.

On Tue, Nov 23, 2010 at 12:15 AM, Lance Norskog <go...@gmail.com> wrote:

> I'm not trying to distinguish them. That is the "find the Netflix user"
> paper :)
>
> I just want to understand the cross-recommender concept, that's all.
> Yes, this sample is too small to impute "enthusiasm"- the numbers are
> recommendation values.
>
> (If the rest of you want to follow along:
> http://www.slideshare.net/tdunning/intelligent-search , slides 35-36)
>
> On Mon, Nov 22, 2010 at 11:50 PM, Sean Owen <sr...@gmail.com> wrote:
> > (PS I don't think that link from Ted is publicly visible but try
> > http://www.slideshare.net/tdunning )
> >
> > Maybe I'm walking into half of a another conversation but what's the
> > question or goal here?
> >
> > I don't think the matrix product contains quite what you're saying.
> > For example U1 records only 2 ratings but has some "enthusiasm" on 3
> > separate days in the matrix product. The product is mashing together
> > item-day associations from all users and applying them to each user.
> >
> > Conceptually user-item-day is the 3-dimensional matrix that it sounds
> > like, if you want to distinguish associations from different users to
> > different items on different days.
> >
> >
> > On Tue, Nov 23, 2010 at 7:24 AM, Lance Norskog <go...@gmail.com>
> wrote:
> >> The GroupLens dataset has User, Item, Rating and Timestamp.
> >> We will use the rating of 1-5 as-is, but will reduce the timestamp
> >> field to day of the week.
> >> The lack of a rating defaults two 3 (neutral). There are 5 ratings
> >> total in the sample:
> >>
> >> U1, I1, 2, ?
> >> U1, I3, 4, ?
> >> U2, I1, 4, ?
> >> U2, I2, 5, T
> >> U2, I3, 3, ?
> >>
> >> (We'll get to the question marks later.)
> >> Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
> >> User v.s. Item contains ratings, and Item v.s. Day of the Week
> >> contains the number of rating records for that item on that day of the
> >> week: ratings only cover Sunday, Monday and Tuesday.
> >>
> >> Formatting tables in kerned fonts just plain doesn't work, thus the
> >> alternate format.
> >>
> >> 2 Users v.s. 3 Items:
> >> I1,I2,I3
> >> {
> >> U1  {2,3,4}
> >> U2  {4,5,3}
> >>  }
> >>
> >> 3 Items v.s. 7 Days of the Week
> >> S,M,T,W,T,F,S
> >> {
> >> I1 {1,0,1,0,0,0,0}
> >> I2 {0,0,1,0,0,0,0}
> >> I3 {0,1,1,0,0,0,0}
> >> }
> >>
> >> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
> >> of the Week:
> >> S,M,T,W,T,F,S
> >> {
> >> U1 {2,4,9,0,0,0,0}
> >> U2 {4,3,12,0,0,0,0}
> >> }
> >>
> >> This matrix carries the total amount of enthusiasm for each user on
> >> each day. To get the average enthusiasm of each user, divide each row
> >> by the total number of ratings per day:
> >> S,M,T,W,T,F,S
> >> {
> >> U1 {2,4,3,0,0,0,0}
> >> U2 {4,3,4,0,0,0,0}
> >> }
> >>
> >> Did I get this right, Ted?
> >>
> >> BTW, where are your slides for this topic? I've seen them a couple of
> >> times in presentations (live and on Fora.tv), but can't find them.
> >>
> >> --
> >> Lance Norskog
> >> lance.norskog@gmail.com
> >>
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Matrix-based recommendation analysis

Posted by Lance Norskog <go...@gmail.com>.

I'm not trying to distinguish them. That is the "find the Netflix user" paper :)

I just want to understand the cross-recommender concept, that's all.
Yes, this sample is too small to impute "enthusiasm"- the numbers are
recommendation values.

(If the rest of you want to follow along:
http://www.slideshare.net/tdunning/intelligent-search , slides 35-36)

On Mon, Nov 22, 2010 at 11:50 PM, Sean Owen <sr...@gmail.com> wrote:
> (PS I don't think that link from Ted is publicly visible but try
> http://www.slideshare.net/tdunning )
>
> Maybe I'm walking into half of a another conversation but what's the
> question or goal here?
>
> I don't think the matrix product contains quite what you're saying.
> For example U1 records only 2 ratings but has some "enthusiasm" on 3
> separate days in the matrix product. The product is mashing together
> item-day associations from all users and applying them to each user.
>
> Conceptually user-item-day is the 3-dimensional matrix that it sounds
> like, if you want to distinguish associations from different users to
> different items on different days.
>
>
> On Tue, Nov 23, 2010 at 7:24 AM, Lance Norskog <go...@gmail.com> wrote:
>> The GroupLens dataset has User, Item, Rating and Timestamp.
>> We will use the rating of 1-5 as-is, but will reduce the timestamp
>> field to day of the week.
>> The lack of a rating defaults two 3 (neutral). There are 5 ratings
>> total in the sample:
>>
>> U1, I1, 2, ?
>> U1, I3, 4, ?
>> U2, I1, 4, ?
>> U2, I2, 5, T
>> U2, I3, 3, ?
>>
>> (We'll get to the question marks later.)
>> Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
>> User v.s. Item contains ratings, and Item v.s. Day of the Week
>> contains the number of rating records for that item on that day of the
>> week: ratings only cover Sunday, Monday and Tuesday.
>>
>> Formatting tables in kerned fonts just plain doesn't work, thus the
>> alternate format.
>>
>> 2 Users v.s. 3 Items:
>> I1,I2,I3
>> {
>> U1  {2,3,4}
>> U2  {4,5,3}
>>  }
>>
>> 3 Items v.s. 7 Days of the Week
>> S,M,T,W,T,F,S
>> {
>> I1 {1,0,1,0,0,0,0}
>> I2 {0,0,1,0,0,0,0}
>> I3 {0,1,1,0,0,0,0}
>> }
>>
>> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
>> of the Week:
>> S,M,T,W,T,F,S
>> {
>> U1 {2,4,9,0,0,0,0}
>> U2 {4,3,12,0,0,0,0}
>> }
>>
>> This matrix carries the total amount of enthusiasm for each user on
>> each day. To get the average enthusiasm of each user, divide each row
>> by the total number of ratings per day:
>> S,M,T,W,T,F,S
>> {
>> U1 {2,4,3,0,0,0,0}
>> U2 {4,3,4,0,0,0,0}
>> }
>>
>> Did I get this right, Ted?
>>
>> BTW, where are your slides for this topic? I've seen them a couple of
>> times in presentations (live and on Fora.tv), but can't find them.
>>
>> --
>> Lance Norskog
>> lance.norskog@gmail.com
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Matrix-based recommendation analysis

Posted by Sean Owen <sr...@gmail.com>.

(PS I don't think that link from Ted is publicly visible but try
http://www.slideshare.net/tdunning )

Maybe I'm walking into half of a another conversation but what's the
question or goal here?

I don't think the matrix product contains quite what you're saying.
For example U1 records only 2 ratings but has some "enthusiasm" on 3
separate days in the matrix product. The product is mashing together
item-day associations from all users and applying them to each user.

Conceptually user-item-day is the 3-dimensional matrix that it sounds
like, if you want to distinguish associations from different users to
different items on different days.


On Tue, Nov 23, 2010 at 7:24 AM, Lance Norskog <go...@gmail.com> wrote:
> The GroupLens dataset has User, Item, Rating and Timestamp.
> We will use the rating of 1-5 as-is, but will reduce the timestamp
> field to day of the week.
> The lack of a rating defaults two 3 (neutral). There are 5 ratings
> total in the sample:
>
> U1, I1, 2, ?
> U1, I3, 4, ?
> U2, I1, 4, ?
> U2, I2, 5, T
> U2, I3, 3, ?
>
> (We'll get to the question marks later.)
> Now, make two matrices, User v.s. Item and Item v.s. Day of the Week.
> User v.s. Item contains ratings, and Item v.s. Day of the Week
> contains the number of rating records for that item on that day of the
> week: ratings only cover Sunday, Monday and Tuesday.
>
> Formatting tables in kerned fonts just plain doesn't work, thus the
> alternate format.
>
> 2 Users v.s. 3 Items:
> I1,I2,I3
> {
> U1  {2,3,4}
> U2  {4,5,3}
>  }
>
> 3 Items v.s. 7 Days of the Week
> S,M,T,W,T,F,S
> {
> I1 {1,0,1,0,0,0,0}
> I2 {0,0,1,0,0,0,0}
> I3 {0,1,1,0,0,0,0}
> }
>
> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
> of the Week:
> S,M,T,W,T,F,S
> {
> U1 {2,4,9,0,0,0,0}
> U2 {4,3,12,0,0,0,0}
> }
>
> This matrix carries the total amount of enthusiasm for each user on
> each day. To get the average enthusiasm of each user, divide each row
> by the total number of ratings per day:
> S,M,T,W,T,F,S
> {
> U1 {2,4,3,0,0,0,0}
> U2 {4,3,4,0,0,0,0}
> }
>
> Did I get this right, Ted?
>
> BTW, where are your slides for this topic? I've seen them a couple of
> times in presentations (live and on Fora.tv), but can't find them.
>
> --
> Lance Norskog
> lance.norskog@gmail.com
>

Re: Matrix-based recommendation analysis

Posted by Ted Dunning <te...@gmail.com>.

I don't know the goal here.

What you did was start with two shadows of a 3 dimensional data set (user x
item x dayOfWeek) in the form of two projections into 2 dimensional form
(user x item formed by summing over dayOfWeek and user x day formed by
summing over user).

You seem to be trying to form another shadow (item x day) by composing the
first two.  Since you have lost information in the first place, you can't
necessarily do this except in special cases.

What you have here is the beginnings of a rank-1 decomposition of a tensor.
 With matrices, the minimum squared error decomposition of this type is the
SVD and is unique down to order of the singular values and sign.
 Unfortunately, there is no comparable unique decompositions for tensors.
 Lots of people have worked on the problem, but there is no clear consensus
for the best way to approach it.

There is a related point where you have two independent actions user x
item_type_1 and user x item_type_2.  Here the product that gives you
item_type_1 x item_type_2 gives useful information because you don't really
have a tensor in the first case, just two matrices.  This is the case I am
talking about when I refer to "cross-recommendation".

I generally prefer to deal with problems like this from the viewpoint of
generalized logistic regression with latent factors chosen to provide a
model in a useful form.  This avoids the ambiguity associated with tensor
decompositions and leads directly to a form that can be optimized.

On Mon, Nov 22, 2010 at 11:24 PM, Lance Norskog <go...@gmail.com> wrote:

>
> Now, multiply these two matrices. The product is 2 Users v.s. 7 Days
> of the Week:
> S,M,T,W,T,F,S
> {
> U1 {2,4,9,0,0,0,0}
> U2 {4,3,12,0,0,0,0}
> }
>
> This matrix carries the total amount of enthusiasm for each user on
> each day. To get the average enthusiasm of each user, divide each row
> by the total number of ratings per day:
> S,M,T,W,T,F,S
> {
> U1 {2,4,3,0,0,0,0}
> U2 {4,3,4,0,0,0,0}
> }
>
> Did I get this right, Ted?