You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jonathan Worthington <jo...@jnthn.net> on 2010/02/11 12:53:57 UTC

Trying to use Taste; where am I going wrong?

Hi!

I'm experimenting with using the Mahout library's Taste implementation 
to provide product recommendations for users as well as identifying 
similar items. The data set is past sales - essentially just a boolean 
relationship "customer X brought item Y". To get something simple 
working - I can optimize and improve later - I just used the file data 
model; my file looks like..

438356039,46305
438356039,46339
438386087,56304
<another 1.5 million or so entries here>

I then create a recommender like:

DataModel Model = new FileDataModel(Path);
ItemSimilarity SimilarityForItems = new PearsonCorrelationSimilarity(Model);
ItemBasedRecommender Item = new GenericItemBasedRecommender(Model, 
SimilarityForItems);

And then do:

List<RecommendedItem> Recommended = Item.mostSimilarItems(ItemID, HowMany);

However, no results are returned. I went digging for why, and wound up 
finding that the itemSimilarity method in AbstractSimilarity was always 
consistently returning NaN. Looking for why, I found that it did indeed 
find places where both users expressed a preference for an item, however 
when computing the various centered sums they all came out to zero; 
computeResult then always gives back NaN. If I comment out the call to 
computeResult and instead replace it with one using the non-centered sums:

    //double result = computeResult(count, centeredSumXY, centeredSumX2, 
centeredSumY2, sumXYdiff2);
    double result = computeResult(count, sumXY, sumX2, sumY2, sumXYdiff2);

Then I do get results; a similar hack in userSimilarity gives back 
results from .recommend too.

My guess is that I'm more likely to be doing something wrong in how I'm 
using Mahout rather than that I've stumbled on a bug, and naturally I'd 
rather use the library "as it comes" rather than a patched version. :-) 
However, I'm not sure what I'm doing wrong, and I'm also decidedly not 
an expert in this field so I'm not familiar with the details of the 
computations being done here. Any thoughts on where I'm going wrong 
would be welcomed. If it helps to know, I'm using the latest (0.2) release.

Many thanks for any insight,

Jonathan


Re: Trying to use Taste; where am I going wrong?

Posted by Jonathan Worthington <jo...@jnthn.net>.
Sean Owen wrote:
> On Thu, Feb 11, 2010 at 11:53 AM, Jonathan Worthington
> <jo...@jnthn.net> wrote:
>   
>> ItemSimilarity SimilarityForItems = new PearsonCorrelationSimilarity(Model);
>>     
>
> Here's the problem, and it's not obvious. Pearson only works when you
> have ratings, and you don't. It'll end up being unable to compute any
> similarities, so no results.
>
> The fix is easy -- use LogLikelihoodSimilarity or
> TanimotoCoefficientSimilarity, which are defined even without ratings.
> I'd start with log-likelihood.
>
>   
Ah, that helped, thanks. I'll play with both and see how the results 
come out, but using LogLikelihoodSimilarity does indeed seem to resolve 
the "no results" problem. Great!

Maybe it's worth adding a note about this in the doc at the top of 
PearsonCorrelationSimilarity.java like:

<p>Note that this is not suitable in a situation where you have a 
boolean model rather than ratings.</p>

I did read that bit of doc in my hunt for answers, though it didn't 
really click that it wouldn't work without ratings. :-)

> If you're interested in more on this, I'll shamelessly plug early
> access to Mahout in Action. The parts available now include all of the
> coverage of recommenders, including much on this issue.
> http://www.manning.com/owen/
>
>   
I hadn't come across that; looks interesting.

Again, thanks,

Jonathan


Re: Trying to use Taste; where am I going wrong?

Posted by Sean Owen <sr...@gmail.com>.
On Thu, Feb 11, 2010 at 11:53 AM, Jonathan Worthington
<jo...@jnthn.net> wrote:
> 438356039,46305
> 438356039,46339
> 438386087,56304
> <another 1.5 million or so entries here>
>
> I then create a recommender like:
>
> DataModel Model = new FileDataModel(Path);

So far, perfect.

> ItemSimilarity SimilarityForItems = new PearsonCorrelationSimilarity(Model);

Here's the problem, and it's not obvious. Pearson only works when you
have ratings, and you don't. It'll end up being unable to compute any
similarities, so no results.

The fix is easy -- use LogLikelihoodSimilarity or
TanimotoCoefficientSimilarity, which are defined even without ratings.
I'd start with log-likelihood.

If you're interested in more on this, I'll shamelessly plug early
access to Mahout in Action. The parts available now include all of the
coverage of recommenders, including much on this issue.
http://www.manning.com/owen/


> ItemBasedRecommender Item = new GenericItemBasedRecommender(Model,
> SimilarityForItems);
>
> And then do:
>
> List<RecommendedItem> Recommended = Item.mostSimilarItems(ItemID, HowMany);

The rest is fine here. You can also compute recommendations of course,
instead of most similar items.