You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Camilo Rostoker <ca...@hotmail.com> on 2011/10/20 23:13:21 UTC

Recommendations without explicit ratings

Hello Mahout Users,

I'm relatively new to recommendations but have some experience with other ML techniques such as clustering.

I'm trying to generate recommendations with data that isn't the conventional user, item, rating format.

Here's some details on the problem I'm trying to solve; hoping someone can suggest the best Mahout algorithm to accomplish this.

1) User's purchase items, potentially the same ones multiple times, but do not give specific ratings to those items.
2) There is rich meta-data for the items (names, categories, descriptions, etc)
3) The data is very sparse. There may be 100,000 items and on average a user may only ever purchase 1-10 of those items.

Some of the approaches I've considered after reading the various Mahout documentation / discussion are:

A) Use an item-based recommender, with the rating being the number of times they bought the item (perhaps normalize the data between 1-10).

B) Use the meta-data to generate similarities between the items, then simply recommend to a user the top N items that are similar to one that they've previously purchased. This could be implemented in Mahout by overriding the ItemSimilarity (as described in this post: http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html). Obviously the challenging part here is figuring out how to generate a similarity score for the two items using the meta-data.

C) Use frequent item-sets to figure out other items that are usually bought with that one, and recommend those.

Any suggestions on this matter would be greatly appreciated.

Cheers,
Cam

Re: Recommendations without explicit ratings

Posted by Lance Norskog <go...@gmail.com>.

The Lucene/Solr MoreLikeThis feature is (I believe) a cosine distance search
across multiple fields of documents. Depending on the domain, its results
may be useful or surreal.

Lance

On Fri, Oct 21, 2011 at 1:20 AM, Sean Owen <sr...@gmail.com> wrote:

> Great point, yes, you could easily use a text search engine to come up
> with a similarity, if the things are text-like documents.
> These aren't recs by themselves, but the similarities can plug in to
> the item-based recommender easily.
>
> On Fri, Oct 21, 2011 at 4:12 AM, Octavian Covalschi
> <oc...@gmail.com> wrote:
> > I'm not an expert but I do have a comment on B). Similarity between meta
> > data can be achieved by using some kind of search engine. For this kind
> of
> > functionality I'm using SOLR (http://wiki.apache.org/solr/MoreLikeThis),
> it
> > has a builtin feature that would give ya similar documents. All you have
> to
> > give it is a doc id... However I think this won't be a real
> recommendation,
> > since similar items may not be something that user want... for example if
> I
> > bought an expensive camera, I may not need any more similar items, right?
> > But in the same time, if I'm buying batteries every half a year.. I may
> be
> > interested in similar products.... so it depends.
> >
> > Just a thought.
> >
> >
> > On Thu, Oct 20, 2011 at 4:30 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker
> >> <ca...@hotmail.com> wrote:
> >> > A) Use an item-based recommender, with the rating being the number of
> >> times they bought the item (perhaps normalize the data between 1-10).
> >>
> >> Yes, good. My first reaction might be to use the logarithm of number
> >> of purchases, or ignore it altogether and just record the association
> >> (a 'boolean' pref) regardless of the purchase count. This only makes a
> >> complete system together with B) or C) though.
> >>
> >> >
> >> > B) Use the meta-data to generate similarities between the items, then
> >> simply recommend to a user the top N items that are similar to one that
> >> they've previously purchased.  This could be implemented in Mahout by
> >> overriding the ItemSimilarity (as described in this post:
> >>
> http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html
> ).
> >>   Obviously the challenging part here is figuring out how to generate a
> >> similarity score for the two items using the meta-data.
> >>
> >> Exactly. You can plug in whatever you logic you want there, but
> >> equally you have to make up that logic. To start, you can experiment
> >> with simplistic rules like considering only items in the same category
> >> "similar". It might do reasonably well as a start.
> >>
> >> You can of course just use purchases, pure collaborative filtering, to
> >> generate similarity. For instance log-likelihood similarity works
> >> well.
> >>
> >>
> >> >
> >> > C) Use frequent item-sets to figure out other items that are usually
> >> bought with that one, and recommend those.
> >>
> >> You could use frequent item sets to determine item-item similarity, as
> >> in B). That's kind of what log-likelihood is doing. This would then be
> >> a plug-in similarity to your item-based algorithm in A).
> >>
> >> If you mean you just want to start with an *item*, and find similar
> >> items, sure you can do that. This is simpler than the full recommender
> >> problem.
> >>
> >
>



-- 
Lance Norskog
goksron@gmail.com

Re: Recommendations without explicit ratings

Posted by Sean Owen <sr...@gmail.com>.

Great point, yes, you could easily use a text search engine to come up
with a similarity, if the things are text-like documents.
These aren't recs by themselves, but the similarities can plug in to
the item-based recommender easily.

On Fri, Oct 21, 2011 at 4:12 AM, Octavian Covalschi
<oc...@gmail.com> wrote:
> I'm not an expert but I do have a comment on B). Similarity between meta
> data can be achieved by using some kind of search engine. For this kind of
> functionality I'm using SOLR (http://wiki.apache.org/solr/MoreLikeThis), it
> has a builtin feature that would give ya similar documents. All you have to
> give it is a doc id... However I think this won't be a real recommendation,
> since similar items may not be something that user want... for example if I
> bought an expensive camera, I may not need any more similar items, right?
> But in the same time, if I'm buying batteries every half a year.. I may be
> interested in similar products.... so it depends.
>
> Just a thought.
>
>
> On Thu, Oct 20, 2011 at 4:30 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker
>> <ca...@hotmail.com> wrote:
>> > A) Use an item-based recommender, with the rating being the number of
>> times they bought the item (perhaps normalize the data between 1-10).
>>
>> Yes, good. My first reaction might be to use the logarithm of number
>> of purchases, or ignore it altogether and just record the association
>> (a 'boolean' pref) regardless of the purchase count. This only makes a
>> complete system together with B) or C) though.
>>
>> >
>> > B) Use the meta-data to generate similarities between the items, then
>> simply recommend to a user the top N items that are similar to one that
>> they've previously purchased.  This could be implemented in Mahout by
>> overriding the ItemSimilarity (as described in this post:
>> http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html).
>>   Obviously the challenging part here is figuring out how to generate a
>> similarity score for the two items using the meta-data.
>>
>> Exactly. You can plug in whatever you logic you want there, but
>> equally you have to make up that logic. To start, you can experiment
>> with simplistic rules like considering only items in the same category
>> "similar". It might do reasonably well as a start.
>>
>> You can of course just use purchases, pure collaborative filtering, to
>> generate similarity. For instance log-likelihood similarity works
>> well.
>>
>>
>> >
>> > C) Use frequent item-sets to figure out other items that are usually
>> bought with that one, and recommend those.
>>
>> You could use frequent item sets to determine item-item similarity, as
>> in B). That's kind of what log-likelihood is doing. This would then be
>> a plug-in similarity to your item-based algorithm in A).
>>
>> If you mean you just want to start with an *item*, and find similar
>> items, sure you can do that. This is simpler than the full recommender
>> problem.
>>
>

Re: Recommendations without explicit ratings

Posted by Octavian Covalschi <oc...@gmail.com>.

I'm not an expert but I do have a comment on B). Similarity between meta
data can be achieved by using some kind of search engine. For this kind of
functionality I'm using SOLR (http://wiki.apache.org/solr/MoreLikeThis), it
has a builtin feature that would give ya similar documents. All you have to
give it is a doc id... However I think this won't be a real recommendation,
since similar items may not be something that user want... for example if I
bought an expensive camera, I may not need any more similar items, right?
But in the same time, if I'm buying batteries every half a year.. I may be
interested in similar products.... so it depends.

Just a thought.


On Thu, Oct 20, 2011 at 4:30 PM, Sean Owen <sr...@gmail.com> wrote:

> On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker
> <ca...@hotmail.com> wrote:
> > A) Use an item-based recommender, with the rating being the number of
> times they bought the item (perhaps normalize the data between 1-10).
>
> Yes, good. My first reaction might be to use the logarithm of number
> of purchases, or ignore it altogether and just record the association
> (a 'boolean' pref) regardless of the purchase count. This only makes a
> complete system together with B) or C) though.
>
> >
> > B) Use the meta-data to generate similarities between the items, then
> simply recommend to a user the top N items that are similar to one that
> they've previously purchased.  This could be implemented in Mahout by
> overriding the ItemSimilarity (as described in this post:
> http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html).
>   Obviously the challenging part here is figuring out how to generate a
> similarity score for the two items using the meta-data.
>
> Exactly. You can plug in whatever you logic you want there, but
> equally you have to make up that logic. To start, you can experiment
> with simplistic rules like considering only items in the same category
> "similar". It might do reasonably well as a start.
>
> You can of course just use purchases, pure collaborative filtering, to
> generate similarity. For instance log-likelihood similarity works
> well.
>
>
> >
> > C) Use frequent item-sets to figure out other items that are usually
> bought with that one, and recommend those.
>
> You could use frequent item sets to determine item-item similarity, as
> in B). That's kind of what log-likelihood is doing. This would then be
> a plug-in similarity to your item-based algorithm in A).
>
> If you mean you just want to start with an *item*, and find similar
> items, sure you can do that. This is simpler than the full recommender
> problem.
>

Re: Recommendations without explicit ratings

Posted by Sean Owen <sr...@gmail.com>.

On Thu, Oct 20, 2011 at 10:13 PM, Camilo Rostoker
<ca...@hotmail.com> wrote:
> A) Use an item-based recommender, with the rating being the number of times they bought the item (perhaps normalize the data between 1-10).

Yes, good. My first reaction might be to use the logarithm of number
of purchases, or ignore it altogether and just record the association
(a 'boolean' pref) regardless of the purchase count. This only makes a
complete system together with B) or C) though.

>
> B) Use the meta-data to generate similarities between the items, then simply recommend to a user the top N items that are similar to one that they've previously purchased.  This could be implemented in Mahout by overriding the ItemSimilarity (as described in this post:  http://lucene.472066.n3.nabble.com/Content-based-Recommender-Implementation-td913981.html).   Obviously the challenging part here is figuring out how to generate a similarity score for the two items using the meta-data.

Exactly. You can plug in whatever you logic you want there, but
equally you have to make up that logic. To start, you can experiment
with simplistic rules like considering only items in the same category
"similar". It might do reasonably well as a start.

You can of course just use purchases, pure collaborative filtering, to
generate similarity. For instance log-likelihood similarity works
well.

>
> C) Use frequent item-sets to figure out other items that are usually bought with that one, and recommend those.

You could use frequent item sets to determine item-item similarity, as
in B). That's kind of what log-likelihood is doing. This would then be
a plug-in similarity to your item-based algorithm in A).

If you mean you just want to start with an *item*, and find similar
items, sure you can do that. This is simpler than the full recommender
problem.