You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by André Rigollet <an...@andrerobot.org> on 2010/07/26 21:15:58 UTC

Search with recommendations

Greetings,

I'm looking for advice for a project I've got. I'm building a streaming
service where users can build their own channels by combining
description tags (like Music, Comedy, Depeche Mode, Mickey Mouse, etc),
that the service uses to automatically build a playlists). 

What I understand is that I could model that like a search problem,
where the sum of the tags is a term vector, or where a search query is
done for each tag separately (1 vector per tag). I've got a prototype of
this using just MySql, but a solution using Solr could be more robust.

The issue here is that I pretend to mix this search problem with a CF
recommendation one: create playlists using user tastes in addition to
the description tags. What this means is that content that the person
would like would have a higher score, which means a greater chance of
being picked for a playlist for that person.

I've been different approaches for  combining these two problems (search
and recommendation):

1. Using a CF Recommender (like an item-based one) for retrieving a list
of what the user would like and then filter it using the description
tags (w/Solr or Lucene). The problems of this approach are that I would
need to generate huge Top Item lists, also that in most cases
(especially when cold-starting or choosing) the Top Lists would have no
elements of a requested tag.

2. Using a search engine to retrieve the files matching the description
tags and then get a recommendation score for them using a CF
Recommender. With this approach I could add other criteria for scoring
such as recency, global popularity, etc. I don't know what could go
wrong with this.

3. From what I've read on this list, in a Lucene index it is possible to
add similar items as terms of an item. This would be enough for making
playlists based on a search query like: "tag:Comedy AND
(similar:Seinfeld OR similar:Cheers)" where Comedy is a description tag,
and Seinfeld with Cheers are all the items the user liked. The problem
of this approach is that similarity would need to be based on a user
independent criteria (no CF) like content similarity or global vote
co-ocurrence.

I would like to know if one of the approaches I mentioned would work, or
if there is a better thought out one. I like the 2nd approach myself,
but I have concerns that it wouldn't scale. My ideal solution would be
to do something like I mentioned in 3, but I don't know if CF could be
possible with that approach or if not using CF at all would be a good
thing overall.

I know that my question could be more related to Lucene or Solr than
Mahout, but as I'm more familiar with CF, cause I've been following
Mahout CF since it was known as Taste, I would appreciate your advice
from a recommendations perspective.

Thanks,
André

Re: Search with recommendations

Posted by André Rigollet <an...@andrerobot.org>.

Owen, thanks for the advice. Using a Rescorer as a filter sounds like a
great idea. Yeah, I would need to build something more complex to make
it scale, but I'm not worried about that right now.

For the Rescorer, I would query Solr for matches and if an item is found
I would also rescore the item based on criteria such as recency. Because
running a query per every candidate item sounds like an expensive thing
to do I would need to cache them (if Solr doesn't do that already).


On Tue, 2010-07-27 at 21:10 +0300, Sean Owen wrote:
> I think you could construe this as a search or CF problem and end up
> with something that works fine. I read into this that you want other
> users' preferences involved, which suggests it is a bit more of a CF
> problem. So I can sketch how that would work. It is also the only
> approach I'd be qualified to comment on.
> 
> One simple approach is to simply perform recommendation as usual, and
> use a Rescorer to filter "in" only items with certain tags. This is
> smarter than generating a long list of recommendations in order to
> ensure some appropriate results are there. I'd start with something
> simple like this and then make it more complex as needed to scale. But
> there's nothing too wrong with this to start.
> 
> On Mon, Jul 26, 2010 at 10:15 PM, André Rigollet <an...@andrerobot.org> wrote:
> > Greetings,
> >
> > I'm looking for advice for a project I've got. I'm building a streaming
> > service where users can build their own channels by combining
> > description tags (like Music, Comedy, Depeche Mode, Mickey Mouse, etc),
> > that the service uses to automatically build a playlists).
> >
> > What I understand is that I could model that like a search problem,
> > where the sum of the tags is a term vector, or where a search query is
> > done for each tag separately (1 vector per tag). I've got a prototype of
> > this using just MySql, but a solution using Solr could be more robust.
> >
> > The issue here is that I pretend to mix this search problem with a CF
> > recommendation one: create playlists using user tastes in addition to
> > the description tags. What this means is that content that the person
> > would like would have a higher score, which means a greater chance of
> > being picked for a playlist for that person.
> >
> > I've been different approaches for  combining these two problems (search
> > and recommendation):
> >
> > 1. Using a CF Recommender (like an item-based one) for retrieving a list
> > of what the user would like and then filter it using the description
> > tags (w/Solr or Lucene). The problems of this approach are that I would
> > need to generate huge Top Item lists, also that in most cases
> > (especially when cold-starting or choosing) the Top Lists would have no
> > elements of a requested tag.
> >
> > 2. Using a search engine to retrieve the files matching the description
> > tags and then get a recommendation score for them using a CF
> > Recommender. With this approach I could add other criteria for scoring
> > such as recency, global popularity, etc. I don't know what could go
> > wrong with this.
> >
> > 3. From what I've read on this list, in a Lucene index it is possible to
> > add similar items as terms of an item. This would be enough for making
> > playlists based on a search query like: "tag:Comedy AND
> > (similar:Seinfeld OR similar:Cheers)" where Comedy is a description tag,
> > and Seinfeld with Cheers are all the items the user liked. The problem
> > of this approach is that similarity would need to be based on a user
> > independent criteria (no CF) like content similarity or global vote
> > co-ocurrence.
> >
> > I would like to know if one of the approaches I mentioned would work, or
> > if there is a better thought out one. I like the 2nd approach myself,
> > but I have concerns that it wouldn't scale. My ideal solution would be
> > to do something like I mentioned in 3, but I don't know if CF could be
> > possible with that approach or if not using CF at all would be a good
> > thing overall.
> >
> > I know that my question could be more related to Lucene or Solr than
> > Mahout, but as I'm more familiar with CF, cause I've been following
> > Mahout CF since it was known as Taste, I would appreciate your advice
> > from a recommendations perspective.
> >
> > Thanks,
> > André
> >
> >

Re: Search with recommendations

Posted by Sean Owen <sr...@gmail.com>.

I think you could construe this as a search or CF problem and end up
with something that works fine. I read into this that you want other
users' preferences involved, which suggests it is a bit more of a CF
problem. So I can sketch how that would work. It is also the only
approach I'd be qualified to comment on.

One simple approach is to simply perform recommendation as usual, and
use a Rescorer to filter "in" only items with certain tags. This is
smarter than generating a long list of recommendations in order to
ensure some appropriate results are there. I'd start with something
simple like this and then make it more complex as needed to scale. But
there's nothing too wrong with this to start.

On Mon, Jul 26, 2010 at 10:15 PM, André Rigollet <an...@andrerobot.org> wrote:
> Greetings,
>
> I'm looking for advice for a project I've got. I'm building a streaming
> service where users can build their own channels by combining
> description tags (like Music, Comedy, Depeche Mode, Mickey Mouse, etc),
> that the service uses to automatically build a playlists).
>
> What I understand is that I could model that like a search problem,
> where the sum of the tags is a term vector, or where a search query is
> done for each tag separately (1 vector per tag). I've got a prototype of
> this using just MySql, but a solution using Solr could be more robust.
>
> The issue here is that I pretend to mix this search problem with a CF
> recommendation one: create playlists using user tastes in addition to
> the description tags. What this means is that content that the person
> would like would have a higher score, which means a greater chance of
> being picked for a playlist for that person.
>
> I've been different approaches for  combining these two problems (search
> and recommendation):
>
> 1. Using a CF Recommender (like an item-based one) for retrieving a list
> of what the user would like and then filter it using the description
> tags (w/Solr or Lucene). The problems of this approach are that I would
> need to generate huge Top Item lists, also that in most cases
> (especially when cold-starting or choosing) the Top Lists would have no
> elements of a requested tag.
>
> 2. Using a search engine to retrieve the files matching the description
> tags and then get a recommendation score for them using a CF
> Recommender. With this approach I could add other criteria for scoring
> such as recency, global popularity, etc. I don't know what could go
> wrong with this.
>
> 3. From what I've read on this list, in a Lucene index it is possible to
> add similar items as terms of an item. This would be enough for making
> playlists based on a search query like: "tag:Comedy AND
> (similar:Seinfeld OR similar:Cheers)" where Comedy is a description tag,
> and Seinfeld with Cheers are all the items the user liked. The problem
> of this approach is that similarity would need to be based on a user
> independent criteria (no CF) like content similarity or global vote
> co-ocurrence.
>
> I would like to know if one of the approaches I mentioned would work, or
> if there is a better thought out one. I like the 2nd approach myself,
> but I have concerns that it wouldn't scale. My ideal solution would be
> to do something like I mentioned in 3, but I don't know if CF could be
> possible with that approach or if not using CF at all would be a good
> thing overall.
>
> I know that my question could be more related to Lucene or Solr than
> Mahout, but as I'm more familiar with CF, cause I've been following
> Mahout CF since it was known as Taste, I would appreciate your advice
> from a recommendations perspective.
>
> Thanks,
> André
>
>