You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Jay Sellers <ja...@gmail.com> on 2010/06/23 00:50:18 UTC

User/Items Reco Engine clustering

I'm looking to enhance a product recommendation engine. It currently works
with all data as a whole. I want to introduce clustering/grouping.  Its
model based and the relationship is the common User-Items relationship.
 Originally I was thinking of using a Canopy / kmeans cluster. And then
determine which cluster a user is in and execute Item Similarity against
only that cluster of items.  However I can't figure out how to build a
SequenceFile using vectors with the User/Items relationship.  I don't know
which data points to feed the vector.  So I scratched that idea and turned
my attention to Lucene, thinking that this is really a document issue. Where
users are documents and the items are the content. I should be able to ask
Lucene, give me documents that look like this "productId3 productId9056
productId234".

I'm looking for any and all feedback from those experienced in the
recommendation world, specifically with the grouping of users and items.

Thanks,
-Jay

RE: User/Items Reco Engine clustering

Posted by Vivek Khanna <vi...@hotmail.com>.

Another way to look at the problem is to consider user purchases/actions as features describing a user in a vector space. Then the problem is reduced to finding users similar to each other based on this feature set.

Clustering would be overly complex in my humble opinion.

I agree with Sean that the Lucene based construction as you describe it Jay, is item-based and not user-based. 

Hope this helps.

> Date: Wed, 23 Jun 2010 08:59:03 +0100
> Subject: Re: User/Items Reco Engine clustering
> From: srowen@gmail.com
> To: dev@mahout.apache.org
> 
> To me you're just describing user-based recommendation. You find a
> neighborhood of similar users, then examine their items, and recommend
> from those by taking a weighted average of the neighborhood's
> preferences.
> 
> Your Lucene-based construction then sounds like item-based
> recommendation. Find items similar to what the user prefers and
> recommend based on a weighted average, again.
> 
> Do I have that right?
> 
> And then, do you need a Hadoop-based implementation using SequenceFiles?
> What kind of data size are you looking at?
> 
> On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <ja...@gmail.com> wrote:
> > Thanks Vivek,
> > We do not have predefined clusters/groups. We expect the groups to mutate as
> > more history (data) is accumulated.  A simple use case is as follows:
> > John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of
> > boots.
> > Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket
> > watch.
> > Larry has viewed a pair of jeans, a cowboy hat and a red shirt.
> >
> > When we send Larry and his items into our reco engine, we would expect a
> > pair of boots and a pocket watch to be recommended.  We'd expect this
> > because we've determined that John and Scott are 'like' Larry and thus are
> > in the same cluster.
> >
> > Again, we fully expect the cluster members to change, as user/item data
> > accumulates.
> >
> > On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <vi...@hotmail.com>wrote:
> >
> >>
> >> Hi,
> >>
> >>
> >>
> >> For your clustering/grouping, what is your expectation? Do you have
> >> pre-defined clusters/groups that you want to cluster the items within those,
> >> or do you envision a system where clusters/groups will change and evolve as
> >> the data changes?
> >>
> >>
> >>
> >> In each case, it seems you are looking for unsupervised approaches. Is that
> >> correct?
> >>
> >>
> >>
> >> I am new to this email list, so pardon my ignorance, but from what work I
> >> have done in the past with IR, ML (clustering, More like this,
> >> categorization, topic detection etc.), my advice to you is to identify your
> >> requirements, use cases and page flow interactions as the first step. :)
> >>
> >>
> >>
> >> Hope this helps!
> >>
> >> Vivek.
> >>
> >> > Date: Tue, 22 Jun 2010 15:50:18 -0700
> >> > Subject: User/Items Reco Engine clustering
> >> > From: jaysellers@gmail.com
> >> > To: dev@mahout.apache.org
> >> >
> >> > I'm looking to enhance a product recommendation engine. It currently
> >> works
> >> > with all data as a whole. I want to introduce clustering/grouping. Its
> >> > model based and the relationship is the common User-Items relationship.
> >> > Originally I was thinking of using a Canopy / kmeans cluster. And then
> >> > determine which cluster a user is in and execute Item Similarity against
> >> > only that cluster of items. However I can't figure out how to build a
> >> > SequenceFile using vectors with the User/Items relationship. I don't know
> >> > which data points to feed the vector. So I scratched that idea and turned
> >> > my attention to Lucene, thinking that this is really a document issue.
> >> Where
> >> > users are documents and the items are the content. I should be able to
> >> ask
> >> > Lucene, give me documents that look like this "productId3 productId9056
> >> > productId234".
> >> >
> >> > I'm looking for any and all feedback from those experienced in the
> >> > recommendation world, specifically with the grouping of users and items.
> >> >
> >> > Thanks,
> >> > -Jay
> >>
> >> _________________________________________________________________
> >> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
> >>
> >> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
> >
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Re: User/Items Reco Engine clustering

Posted by Sean Owen <sr...@gmail.com>.

To me you're just describing user-based recommendation. You find a
neighborhood of similar users, then examine their items, and recommend
from those by taking a weighted average of the neighborhood's
preferences.

Your Lucene-based construction then sounds like item-based
recommendation. Find items similar to what the user prefers and
recommend based on a weighted average, again.

Do I have that right?

And then, do you need a Hadoop-based implementation using SequenceFiles?
What kind of data size are you looking at?

On Wed, Jun 23, 2010 at 12:49 AM, Jay Sellers <ja...@gmail.com> wrote:
> Thanks Vivek,
> We do not have predefined clusters/groups. We expect the groups to mutate as
> more history (data) is accumulated.  A simple use case is as follows:
> John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of
> boots.
> Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket
> watch.
> Larry has viewed a pair of jeans, a cowboy hat and a red shirt.
>
> When we send Larry and his items into our reco engine, we would expect a
> pair of boots and a pocket watch to be recommended.  We'd expect this
> because we've determined that John and Scott are 'like' Larry and thus are
> in the same cluster.
>
> Again, we fully expect the cluster members to change, as user/item data
> accumulates.
>
> On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <vi...@hotmail.com>wrote:
>
>>
>> Hi,
>>
>>
>>
>> For your clustering/grouping, what is your expectation? Do you have
>> pre-defined clusters/groups that you want to cluster the items within those,
>> or do you envision a system where clusters/groups will change and evolve as
>> the data changes?
>>
>>
>>
>> In each case, it seems you are looking for unsupervised approaches. Is that
>> correct?
>>
>>
>>
>> I am new to this email list, so pardon my ignorance, but from what work I
>> have done in the past with IR, ML (clustering, More like this,
>> categorization, topic detection etc.), my advice to you is to identify your
>> requirements, use cases and page flow interactions as the first step. :)
>>
>>
>>
>> Hope this helps!
>>
>> Vivek.
>>
>> > Date: Tue, 22 Jun 2010 15:50:18 -0700
>> > Subject: User/Items Reco Engine clustering
>> > From: jaysellers@gmail.com
>> > To: dev@mahout.apache.org
>> >
>> > I'm looking to enhance a product recommendation engine. It currently
>> works
>> > with all data as a whole. I want to introduce clustering/grouping. Its
>> > model based and the relationship is the common User-Items relationship.
>> > Originally I was thinking of using a Canopy / kmeans cluster. And then
>> > determine which cluster a user is in and execute Item Similarity against
>> > only that cluster of items. However I can't figure out how to build a
>> > SequenceFile using vectors with the User/Items relationship. I don't know
>> > which data points to feed the vector. So I scratched that idea and turned
>> > my attention to Lucene, thinking that this is really a document issue.
>> Where
>> > users are documents and the items are the content. I should be able to
>> ask
>> > Lucene, give me documents that look like this "productId3 productId9056
>> > productId234".
>> >
>> > I'm looking for any and all feedback from those experienced in the
>> > recommendation world, specifically with the grouping of users and items.
>> >
>> > Thanks,
>> > -Jay
>>
>> _________________________________________________________________
>> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
>>
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3
>

Re: User/Items Reco Engine clustering

Posted by Jay Sellers <ja...@gmail.com>.

Thanks Vivek,
We do not have predefined clusters/groups. We expect the groups to mutate as
more history (data) is accumulated.  A simple use case is as follows:
John has viewed a pair of jeans, a cowboy hat, a red shirt and a pair of
boots.
Scott has viewed a pair of jeans, a cowboy hat, a red shirt and a pocket
watch.
Larry has viewed a pair of jeans, a cowboy hat and a red shirt.

When we send Larry and his items into our reco engine, we would expect a
pair of boots and a pocket watch to be recommended.  We'd expect this
because we've determined that John and Scott are 'like' Larry and thus are
in the same cluster.

Again, we fully expect the cluster members to change, as user/item data
accumulates.

On Tue, Jun 22, 2010 at 4:37 PM, Vivek Khanna <vi...@hotmail.com>wrote:

>
> Hi,
>
>
>
> For your clustering/grouping, what is your expectation? Do you have
> pre-defined clusters/groups that you want to cluster the items within those,
> or do you envision a system where clusters/groups will change and evolve as
> the data changes?
>
>
>
> In each case, it seems you are looking for unsupervised approaches. Is that
> correct?
>
>
>
> I am new to this email list, so pardon my ignorance, but from what work I
> have done in the past with IR, ML (clustering, More like this,
> categorization, topic detection etc.), my advice to you is to identify your
> requirements, use cases and page flow interactions as the first step. :)
>
>
>
> Hope this helps!
>
> Vivek.
>
> > Date: Tue, 22 Jun 2010 15:50:18 -0700
> > Subject: User/Items Reco Engine clustering
> > From: jaysellers@gmail.com
> > To: dev@mahout.apache.org
> >
> > I'm looking to enhance a product recommendation engine. It currently
> works
> > with all data as a whole. I want to introduce clustering/grouping. Its
> > model based and the relationship is the common User-Items relationship.
> > Originally I was thinking of using a Canopy / kmeans cluster. And then
> > determine which cluster a user is in and execute Item Similarity against
> > only that cluster of items. However I can't figure out how to build a
> > SequenceFile using vectors with the User/Items relationship. I don't know
> > which data points to feed the vector. So I scratched that idea and turned
> > my attention to Lucene, thinking that this is really a document issue.
> Where
> > users are documents and the items are the content. I should be able to
> ask
> > Lucene, give me documents that look like this "productId3 productId9056
> > productId234".
> >
> > I'm looking for any and all feedback from those experienced in the
> > recommendation world, specifically with the grouping of users and items.
> >
> > Thanks,
> > -Jay
>
> _________________________________________________________________
> The New Busy is not the old busy. Search, chat and e-mail from your inbox.
>
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3

RE: User/Items Reco Engine clustering

Posted by Vivek Khanna <vi...@hotmail.com>.

Hi,

 

For your clustering/grouping, what is your expectation? Do you have pre-defined clusters/groups that you want to cluster the items within those, or do you envision a system where clusters/groups will change and evolve as the data changes?

 

In each case, it seems you are looking for unsupervised approaches. Is that correct?

 

I am new to this email list, so pardon my ignorance, but from what work I have done in the past with IR, ML (clustering, More like this, categorization, topic detection etc.), my advice to you is to identify your requirements, use cases and page flow interactions as the first step. :)

 

Hope this helps!

Vivek.
 
> Date: Tue, 22 Jun 2010 15:50:18 -0700
> Subject: User/Items Reco Engine clustering
> From: jaysellers@gmail.com
> To: dev@mahout.apache.org
> 
> I'm looking to enhance a product recommendation engine. It currently works
> with all data as a whole. I want to introduce clustering/grouping. Its
> model based and the relationship is the common User-Items relationship.
> Originally I was thinking of using a Canopy / kmeans cluster. And then
> determine which cluster a user is in and execute Item Similarity against
> only that cluster of items. However I can't figure out how to build a
> SequenceFile using vectors with the User/Items relationship. I don't know
> which data points to feed the vector. So I scratched that idea and turned
> my attention to Lucene, thinking that this is really a document issue. Where
> users are documents and the items are the content. I should be able to ask
> Lucene, give me documents that look like this "productId3 productId9056
> productId234".
> 
> I'm looking for any and all feedback from those experienced in the
> recommendation world, specifically with the grouping of users and items.
> 
> Thanks,
> -Jay
 		 	   		  
_________________________________________________________________
The New Busy is not the old busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_3