You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Daniel Quach <da...@cs.ucla.edu> on 2012/05/08 09:53:39 UTC

how to implement item-based recommender on movie genre data?

Suppose that I want to give each movie a profile based on the genres each contains.

For naive and simplistic purposes, let's pretend that each movie has a vector where each column is a genre, a 1 in that column indicates that the movie contains that genre, 0 otherwise.

How would I feed such data into an Item-based Recommender? I want this recommender to use these vectors for calculating similarity for recommendations, which in turn is used for preference estimation (just as described in section 4.4.1 of the Mahout in Action book)

The example in the book is not immediately clear to me. The sample code does not mention the format of the data being used in creating the ItemSimilarity object.

Re: how to implement item-based recommender on movie genre data?

Posted by Sean Owen <sr...@gmail.com>.

If you just need a similarity metric, you don't need a recommender -- of
which similarity is just a part. If the movie is 'user' and genre is 'item'
then you just use a UserSimilarity implementation to figure the similarity
between any two movies. You don't need anything more than that.

On Thu, May 10, 2012 at 7:29 AM, Daniel Quach <da...@cs.ucla.edu> wrote:

> Well, actually, I wanted to represent each movie with a vector
>
> [1, 0, 0, 1, 0]
>
> Where each column represents an explicit genre, a 1 indicating that the
> movie has that genre while a 0 indicates it is not (a crude representation,
> I'm sure)
>
> I wanted to implement an item based recommender that uses these vectors to
> compute similarity between items.
>
> I think I figured it out, I could represent vector data as preferences
> where instead of user ID's, it would be column indices. Then load that into
> a DataModel for use with the ItemSimilarity object. The
> ItemBasedRecommender could load the DataModel with userID's while using
> this ItemSimilarity object for calculating similarities.
>
> This could possibly be a poor choice from an efficiency, accuracy, and
> machine learning standpoint, I am not an expert on the subject at all.
>
> On May 8, 2012, at 12:58 AM, Sean Owen wrote:
>
> > So you have already decided, for each movie, whether it's in or not in
> each
> > genre? And then you want to create a "profile" -- assuming you mean some
> > kind of meta-genre?
> >
> > This isn't a recommender problem; it's just a clustering problem. I'd use
> > the Tanimoto similarity.
> > You could run the clustering-based recommender just to build the
> clusters.
> > You wouldn't use it for recommendations.
> >
> > On Tue, May 8, 2012 at 8:53 AM, Daniel Quach <da...@cs.ucla.edu>
> wrote:
> >
> >> Suppose that I want to give each movie a profile based on the genres
> each
> >> contains.
> >>
> >> For naive and simplistic purposes, let's pretend that each movie has a
> >> vector where each column is a genre, a 1 in that column indicates that
> the
> >> movie contains that genre, 0 otherwise.
> >>
> >> How would I feed such data into an Item-based Recommender? I want this
> >> recommender to use these vectors for calculating similarity for
> >> recommendations, which in turn is used for preference estimation (just
> as
> >> described in section 4.4.1 of the Mahout in Action book)
> >>
> >> The example in the book is not immediately clear to me. The sample code
> >> does not mention the format of the data being used in creating the
> >> ItemSimilarity object.
>
>

Re: how to implement item-based recommender on movie genre data?

Posted by Daniel Quach <da...@cs.ucla.edu>.

Well, actually, I wanted to represent each movie with a vector

[1, 0, 0, 1, 0]

Where each column represents an explicit genre, a 1 indicating that the movie has that genre while a 0 indicates it is not (a crude representation, I'm sure)

I wanted to implement an item based recommender that uses these vectors to compute similarity between items.

I think I figured it out, I could represent vector data as preferences where instead of user ID's, it would be column indices. Then load that into a DataModel for use with the ItemSimilarity object. The ItemBasedRecommender could load the DataModel with userID's while using this ItemSimilarity object for calculating similarities.

This could possibly be a poor choice from an efficiency, accuracy, and machine learning standpoint, I am not an expert on the subject at all.

On May 8, 2012, at 12:58 AM, Sean Owen wrote:

> So you have already decided, for each movie, whether it's in or not in each
> genre? And then you want to create a "profile" -- assuming you mean some
> kind of meta-genre?
> 
> This isn't a recommender problem; it's just a clustering problem. I'd use
> the Tanimoto similarity.
> You could run the clustering-based recommender just to build the clusters.
> You wouldn't use it for recommendations.
> 
> On Tue, May 8, 2012 at 8:53 AM, Daniel Quach <da...@cs.ucla.edu> wrote:
> 
>> Suppose that I want to give each movie a profile based on the genres each
>> contains.
>> 
>> For naive and simplistic purposes, let's pretend that each movie has a
>> vector where each column is a genre, a 1 in that column indicates that the
>> movie contains that genre, 0 otherwise.
>> 
>> How would I feed such data into an Item-based Recommender? I want this
>> recommender to use these vectors for calculating similarity for
>> recommendations, which in turn is used for preference estimation (just as
>> described in section 4.4.1 of the Mahout in Action book)
>> 
>> The example in the book is not immediately clear to me. The sample code
>> does not mention the format of the data being used in creating the
>> ItemSimilarity object.

Re: how to implement item-based recommender on movie genre data?

Posted by Sean Owen <sr...@gmail.com>.

So you have already decided, for each movie, whether it's in or not in each
genre? And then you want to create a "profile" -- assuming you mean some
kind of meta-genre?

This isn't a recommender problem; it's just a clustering problem. I'd use
the Tanimoto similarity.
You could run the clustering-based recommender just to build the clusters.
You wouldn't use it for recommendations.

On Tue, May 8, 2012 at 8:53 AM, Daniel Quach <da...@cs.ucla.edu> wrote:

> Suppose that I want to give each movie a profile based on the genres each
> contains.
>
> For naive and simplistic purposes, let's pretend that each movie has a
> vector where each column is a genre, a 1 in that column indicates that the
> movie contains that genre, 0 otherwise.
>
> How would I feed such data into an Item-based Recommender? I want this
> recommender to use these vectors for calculating similarity for
> recommendations, which in turn is used for preference estimation (just as
> described in section 4.4.1 of the Mahout in Action book)
>
> The example in the book is not immediately clear to me. The sample code
> does not mention the format of the data being used in creating the
> ItemSimilarity object.