You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Amel Fraisse <am...@gmail.com> on 2010/10/19 13:42:16 UTC

Fastest way to compute compute correlation between users and generate new recommendations

Hello,

I am trying to use Mahout this day.
I wanted to ask about the fastest way to compute correlations between a
large number of users and recommend new recommendations?

It is better to use a FileDataModel ?

It is possible to precompute user correlation ? when I stock the result and
how i could use it?

Thank you for your help.




-- 
--------------
Amel Fraisse

Re: Fastest way to compute compute correlation between users and generate new recommendations

Posted by Sean Owen <sr...@gmail.com>.
That's all right.

Call recommender.refresh() to force an update of everything, including any
internal caches, and the FileDataModel. FileDataModel will try to
intelligently avoid reloading the main data file if it has not changed.

The way you do it here, all correlations are always recomputed every time.
There is no caching. You would need a CachingUserSimilarity wrapper for
that. Calling refresh() clears the entire cache; there's not a way to just
clear the entries affected by the change since this is pretty hard to record
and know efficiently. You could hack up the code to do it.

But I'd suggest that you don't need to recompute similarities every time any
data point changes, since most similarities don't change, or if they do,
change only a little. That's an argument to never call refresh() directly.
FileDataModel will pick up updates on its own. Eventually the cached
similarities will be recomputed. Maybe you can force it once a day to be
sure.

The biggest problem this causes is for very new users. They may establish
some user-user similarities based on little data, and then, they are stuck
with those similarities until the cache updates and new info is
incorporated. If it's not a big deal in your setup, then you don't have to
worry about this.

On Tue, Oct 19, 2010 at 2:00 PM, Amel Fraisse <am...@gmail.com>wrote:

> In fact I am using the PearsonCorrelation to compute user correlation and
> then I compute recommendations using Recommender class.
> As follows:
>
> DataModel model = new FileDataModel(new File("rating.txt"));
>
>        if (model !=null) {
>        LongPrimitiveIterator user = model.getUserIDs();
>        }
>        // compute PearsonCorrelation on the DataModel
>        UserSimilarity userSimilarity = new
> PearsonCorrelationSimilarity(model);
>
>        //compute The k-Nearst Neighborhood of the current user
>        UserNeighborhood neighborhood = new
> NearestNUserNeighborhood(KNeighborhood, userSimilarity, model);
>
>        //Get the top N recommendations
>        Recommender recommender = new GenericUserBasedRecommender(model,
> neighborhood, userSimilarity);
>
>
> My question is that: How I refresh the DataModel when ratings are modified
> ?
> it is possible to make calculation only on the delta file ?
>
> And how I avoid to recompute user correlation and then recommendation when
> I
> don't have any modification of  ratings?
>
> Thank you.
>
>
> 2010/10/19 Sean Owen <sr...@gmail.com>
>
> > If you specifically want a correlation, meaning the Pearson correlation,
> > then you want to use PearsonCorrelationSimilarity. If you just mean you
> > want
> > some notion of similarity, then any implementation of UserSimilarity
> could
> > be used. If speed is your concern, then I would try
> > LogLikelihoodSimilarity.
> >
> > I am not sure if you want to compute correlations, or compute
> > recommendations. To compute correlations, use
> PearsonCorrelationSimilarity
> > on whatever pairs of users you like to get an answer. To compute
> > recommendations, using an algorithm based on user-user similarity, then
> use
> > GenericUserBasedRecommender.
> >
> > You can use any DataModel implementation. FileDataModel is fine.
> >
> > You can do whatever you like, including computing correlations ahead of
> > time. You can use GenericUserSimilarity to feed in these pre-computed
> > similarities for use in a Recommender.
> >
> > On Tue, Oct 19, 2010 at 12:42 PM, Amel Fraisse <amel.fraisse@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I am trying to use Mahout this day.
> > > I wanted to ask about the fastest way to compute correlations between a
> > > large number of users and recommend new recommendations?
> > >
> > > It is better to use a FileDataModel ?
> > >
> > > It is possible to precompute user correlation ? when I stock the result
> > and
> > > how i could use it?
> > >
> > > Thank you for your help.
> > >
> > >
> > >
> > >
> > > --
> > > --------------
> > > Amel Fraisse
> > >
> >
>
>
>
> --
> --------------
> Amel Fraisse
>

Re: Fastest way to compute compute correlation between users and generate new recommendations

Posted by Amel Fraisse <am...@gmail.com>.
In fact I am using the PearsonCorrelation to compute user correlation and
then I compute recommendations using Recommender class.
As follows:

DataModel model = new FileDataModel(new File("rating.txt"));

        if (model !=null) {
        LongPrimitiveIterator user = model.getUserIDs();
        }
        // compute PearsonCorrelation on the DataModel
        UserSimilarity userSimilarity = new
PearsonCorrelationSimilarity(model);

        //compute The k-Nearst Neighborhood of the current user
        UserNeighborhood neighborhood = new
NearestNUserNeighborhood(KNeighborhood, userSimilarity, model);

        //Get the top N recommendations
        Recommender recommender = new GenericUserBasedRecommender(model,
neighborhood, userSimilarity);


My question is that: How I refresh the DataModel when ratings are modified ?
it is possible to make calculation only on the delta file ?

And how I avoid to recompute user correlation and then recommendation when I
don't have any modification of  ratings?

Thank you.


2010/10/19 Sean Owen <sr...@gmail.com>

> If you specifically want a correlation, meaning the Pearson correlation,
> then you want to use PearsonCorrelationSimilarity. If you just mean you
> want
> some notion of similarity, then any implementation of UserSimilarity could
> be used. If speed is your concern, then I would try
> LogLikelihoodSimilarity.
>
> I am not sure if you want to compute correlations, or compute
> recommendations. To compute correlations, use PearsonCorrelationSimilarity
> on whatever pairs of users you like to get an answer. To compute
> recommendations, using an algorithm based on user-user similarity, then use
> GenericUserBasedRecommender.
>
> You can use any DataModel implementation. FileDataModel is fine.
>
> You can do whatever you like, including computing correlations ahead of
> time. You can use GenericUserSimilarity to feed in these pre-computed
> similarities for use in a Recommender.
>
> On Tue, Oct 19, 2010 at 12:42 PM, Amel Fraisse <amel.fraisse@gmail.com
> >wrote:
>
> > Hello,
> >
> > I am trying to use Mahout this day.
> > I wanted to ask about the fastest way to compute correlations between a
> > large number of users and recommend new recommendations?
> >
> > It is better to use a FileDataModel ?
> >
> > It is possible to precompute user correlation ? when I stock the result
> and
> > how i could use it?
> >
> > Thank you for your help.
> >
> >
> >
> >
> > --
> > --------------
> > Amel Fraisse
> >
>



-- 
--------------
Amel Fraisse

Re: Fastest way to compute compute correlation between users and generate new recommendations

Posted by Sean Owen <sr...@gmail.com>.
If you specifically want a correlation, meaning the Pearson correlation,
then you want to use PearsonCorrelationSimilarity. If you just mean you want
some notion of similarity, then any implementation of UserSimilarity could
be used. If speed is your concern, then I would try LogLikelihoodSimilarity.

I am not sure if you want to compute correlations, or compute
recommendations. To compute correlations, use PearsonCorrelationSimilarity
on whatever pairs of users you like to get an answer. To compute
recommendations, using an algorithm based on user-user similarity, then use
GenericUserBasedRecommender.

You can use any DataModel implementation. FileDataModel is fine.

You can do whatever you like, including computing correlations ahead of
time. You can use GenericUserSimilarity to feed in these pre-computed
similarities for use in a Recommender.

On Tue, Oct 19, 2010 at 12:42 PM, Amel Fraisse <am...@gmail.com>wrote:

> Hello,
>
> I am trying to use Mahout this day.
> I wanted to ask about the fastest way to compute correlations between a
> large number of users and recommend new recommendations?
>
> It is better to use a FileDataModel ?
>
> It is possible to precompute user correlation ? when I stock the result and
> how i could use it?
>
> Thank you for your help.
>
>
>
>
> --
> --------------
> Amel Fraisse
>