You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Gokul Pillai <go...@gmail.com> on 2012/09/13 07:55:27 UTC

Newbie question on modeling a Recommender using Mahout when the matrix is sparse

I am trying out Mahout to come up with product recommendations for users
based on data that show what products they use today.
The data is not web-scale, just about 300,000 users and 7 products. Few
comments about the data here:
1. Since users either have or not have a particular product, the value in
the matrix is either "1" or "0" for all the columns (rows being the userids)
2. All the users have one basic product, so I discounted this from the
data-model passed to the Mahout recommender since I assume that if everyone
has the same product, its effect on the recommendations are trivial.
3. The matrix itself is sparse, the total counts of users having each
product is :
A=31847, 54754,1897 |    23154 |    2201 |    2766 |    33585

Steps followed:
1. Created a data-source from the user-product table in the database
        File ratingsFile = new
File("datasets/products.csv");
        DataModel model = new FileDataModel(ratingsFile);
  2.  Created a recommender on this data
        CachingRecommender recommender = new CachingRecommender(new
SlopeOneRecommender(model));
3. Loop through all users and get the top ten recommendations:
        List<RecommendedItem> recommendations =
recommender.recommend(userId, 10);

Issue faced:
The problem I am facing is that the recommendations that come out are way
too simple - meaning that all that it seems like what is being recommended
is "if a user does not have product A, then recommend it, if they dont have
product B, then recommend it and so on." Basically a simple inverse of
their ownership status.

Obviously, I am not doing something right here. How can I do the modeling
better to get the right recommendations. Or is it that my dataset (300000
users times 7 products) is too small for Mahout to work with?

Look forward to your comments. Thanks.

Re: Newbie question on modeling a Recommender using Mahout when the matrix is sparse

Posted by Gokul Pillai <go...@gmail.com>.
Very true, good catch. I think I was interpreting the results the wrong way.
I expect only the top 5, so I changed the parameter to "5" instead of "10"
and the results are as expected now.

Thanks.

On Wed, Sep 12, 2012 at 11:36 PM, Sean Owen <sr...@gmail.com> wrote:

> Well there are only 7 products in the universe! If you ask for 10
> recommendations, you will always get all unrated items back in the
> recommendations. That's always true unless the algorithm can't
> actually establish a value for some items.
>
> What result were you expecting, less than 10 recs? less than 7?
>
> On Thu, Sep 13, 2012 at 6:55 AM, Gokul Pillai <go...@gmail.com>
> wrote:
> > I am trying out Mahout to come up with product recommendations for users
> > based on data that show what products they use today.
> > The data is not web-scale, just about 300,000 users and 7 products. Few
> > comments about the data here:
> > 1. Since users either have or not have a particular product, the value in
> > the matrix is either "1" or "0" for all the columns (rows being the
> userids)
> > 2. All the users have one basic product, so I discounted this from the
> > data-model passed to the Mahout recommender since I assume that if
> everyone
> > has the same product, its effect on the recommendations are trivial.
> > 3. The matrix itself is sparse, the total counts of users having each
> > product is :
> > A=31847, 54754,1897 |    23154 |    2201 |    2766 |    33585
> >
> > Steps followed:
> > 1. Created a data-source from the user-product table in the database
> >         File ratingsFile = new
> > File("datasets/products.csv");
> >         DataModel model = new FileDataModel(ratingsFile);
> >   2.  Created a recommender on this data
> >         CachingRecommender recommender = new CachingRecommender(new
> > SlopeOneRecommender(model));
> > 3. Loop through all users and get the top ten recommendations:
> >         List<RecommendedItem> recommendations =
> > recommender.recommend(userId, 10);
> >
> > Issue faced:
> > The problem I am facing is that the recommendations that come out are way
> > too simple - meaning that all that it seems like what is being
> recommended
> > is "if a user does not have product A, then recommend it, if they dont
> have
> > product B, then recommend it and so on." Basically a simple inverse of
> > their ownership status.
> >
> > Obviously, I am not doing something right here. How can I do the modeling
> > better to get the right recommendations. Or is it that my dataset (300000
> > users times 7 products) is too small for Mahout to work with?
> >
> > Look forward to your comments. Thanks.
>

Re: Newbie question on modeling a Recommender using Mahout when the matrix is sparse

Posted by Sean Owen <sr...@gmail.com>.
Well there are only 7 products in the universe! If you ask for 10
recommendations, you will always get all unrated items back in the
recommendations. That's always true unless the algorithm can't
actually establish a value for some items.

What result were you expecting, less than 10 recs? less than 7?

On Thu, Sep 13, 2012 at 6:55 AM, Gokul Pillai <go...@gmail.com> wrote:
> I am trying out Mahout to come up with product recommendations for users
> based on data that show what products they use today.
> The data is not web-scale, just about 300,000 users and 7 products. Few
> comments about the data here:
> 1. Since users either have or not have a particular product, the value in
> the matrix is either "1" or "0" for all the columns (rows being the userids)
> 2. All the users have one basic product, so I discounted this from the
> data-model passed to the Mahout recommender since I assume that if everyone
> has the same product, its effect on the recommendations are trivial.
> 3. The matrix itself is sparse, the total counts of users having each
> product is :
> A=31847, 54754,1897 |    23154 |    2201 |    2766 |    33585
>
> Steps followed:
> 1. Created a data-source from the user-product table in the database
>         File ratingsFile = new
> File("datasets/products.csv");
>         DataModel model = new FileDataModel(ratingsFile);
>   2.  Created a recommender on this data
>         CachingRecommender recommender = new CachingRecommender(new
> SlopeOneRecommender(model));
> 3. Loop through all users and get the top ten recommendations:
>         List<RecommendedItem> recommendations =
> recommender.recommend(userId, 10);
>
> Issue faced:
> The problem I am facing is that the recommendations that come out are way
> too simple - meaning that all that it seems like what is being recommended
> is "if a user does not have product A, then recommend it, if they dont have
> product B, then recommend it and so on." Basically a simple inverse of
> their ownership status.
>
> Obviously, I am not doing something right here. How can I do the modeling
> better to get the right recommendations. Or is it that my dataset (300000
> users times 7 products) is too small for Mahout to work with?
>
> Look forward to your comments. Thanks.