You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Manuel Blechschmidt <Ma...@gmx.de> on 2011/11/21 12:06:54 UTC

Evaluation of different recommendation algorithms for 12.000 user data set

Hello Mahout Team, hello users,
me and a friend are currently evaluating recommendation techniques for personalizing a newsletter for a company selling tea, spices and some other products. Mahout is such a great product which saves me hours of time and millions of money because I want to give something back I write this small case study to the mailing list.

I am conducting an offline testing of which recommender is the most accurate one. Further I am interested in run time behavior like memory consumption and runtime.

The data contains implicit feedback. The preferences of the user is the amount in gramm that he bought from a certain product (453 g ~ 1 pound). If a certain product does not have this data it is replaced with 50. So basically I want mahout to predict how much of a certain product is a user buying next. This is also helpful for demand planing. I am currently not using any time data because I did not find a recommender which is using this data.

Users: 12858
Items: 5467
121304 preferences
MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg of a certain tea or spice)
MinPreference: 50.0

Here are the pure benchmarks for accuracy in RMSE. They change during every run of the evaluation (~15%):

Evaluation of randomBased (baseline): 43045.380570443434 (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
Evaluation of ItemBased with Pearson Correlation: 315.5804958647985 (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model)) (Time: ~1s)  (Memory: 35MB)
Evaluation of ItemBase with uncentered Cosine: 198.25393235323375 (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model))) (Time: ~1s)  (Memory: 32MB)
Evaluation of ItemBase with log likelihood: 176.45243607278724 (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))  (Time: ~5s)  (Memory: 42MB)
Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868 (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3, PearsonCorrelationSimilarity(model), model), PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB) 
Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288 (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20, PearsonCorrelationSimilarity(model), model), PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model)) (Time: ~4s) (Memory: 604MB)
Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100, 0.3, 5)) (Time: ) (Memory: 691MB)

These were measured with the following method:

RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
double evaluation = evaluator.evaluate(randomBased, null, myModel,
	0.9, 1.0);

Memory usage was about 50m with the item based case. Slope One and SVD base seams to use the most memory (615MB & 691MB).

The performance differs a lot. The fastest ones where the item based. They took about 1 to 5 seconds (PearsonCorrelationSimilarity and UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
The user based where a lot slower.

Conclusion is that in my case the item based approach is the fastest, lowest memory consumption and most accurate one. Further I can use the recommendedBecause function.

Here is the spec of the computer:
2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine. 

In the next step, probably in the next 2 month. I have to design a newsletter and send it to the customers. Then I can benchmark the user acceptance rate of the recommendations.

Any suggestions for enhancements are appreciated. If anybody is interested in the dataset or the evaluation code send me a private email. I might be able to convince the company to give out the dataset if the person is doing some interesting research.

/Manuel
-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Posted by Ted Dunning <te...@gmail.com>.

Filtering recommendations lists is incredibly important.  What you are
doing is pretty straightforward with post-processing of the recommended
list.

Other things that I often recommend include:

- dithering.  This is partial randomization of your results list that moves
items deep in the list higher, but mostly leaves the top items in place.
 This helps your algorithm explore more and helps avoid the problem of
people never clicking to the second page.  Dithering can make more
difference than all but the largest algorithm differences.

- anti-flood.  It is important to not have a results list be dominated by a
single kind of thing.  The segregation of your email is a form of this.  I
often implement this by downgrading the scores of items very similar to
higher scoring items.  In some domains this makes a night and day
difference.

On Mon, Nov 21, 2011 at 3:28 PM, Manuel Blechschmidt <
Manuel.Blechschmidt@gmx.de> wrote:

> Thanks for the answer Ted.
>
> On 21.11.2011, at 16:20, Ted Dunning wrote:
>
> > Your product is subject to seasonality constraints (which teas are likely
> > right now) and repeat buying.  I would separate out the recommendation of
> > repeat buys from the separation of new items.
>
> Actually I want to generate an email with diverse recommendations.
>
> Something like:
>
> Your personal top sellers:
> .. 3 items ...
>
> Special Winter Sales:
> ... 3 items ...
>
> This might be interesting for you:
> ... 6 items ...
>
> This is new in our store:
> ... 3 items ...
>
> >
> > You may also find that item-item links on your web site are helpful.
>  These
> > are easy to get using this system.
>
> Yes, actually the website is already using some very basic item-to-item
> recommendations. So I am more interested in the newsletter part especially
> because I can track which items are really attractive and which aren't.
>
> /Manuel
>
> >
> > On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
> > Manuel.Blechschmidt@gmx.de> wrote:
> >
> >> Hello Sean,
> >>
> >> On 21.11.2011, at 12:16, Sean Owen wrote:
> >>
> >>> Yes, because you have fewer items, an item-item-similarity-based
> >> algorithm
> >>> probably runs much faster.
> >>
> >> Thanks for your blazing fast feedback.
> >>
> >>>
> >>> I would not necessarily use the raw number of kg as a preference. It's
> >> not
> >>> really true that someone who buys 10kg of an item likes it 10x more
> than
> >>> one he buys 1kg of. Maybe the second spice is much more valuable? I
> would
> >>> at least try taking the logarithm of the weight, but, I think this is
> >> very
> >>> noisy as a proxy for "preference". It creates illogical leaps --
> because
> >>> one user bought 85kg of X, and Y is "similar" to X, this would conclude
> >>> that you're somewhat likely to buy 85kg of Y too. I would probably not
> >> use
> >>> weight at all this way.
> >>
> >> Thanks for this suggestions. I will consider to integrate a logarithmic
> >> weight into the recommender. At the moment I am more concerned to get
> the
> >> user feedback component working. From some manual tests I can already
> tell
> >> that the recommendation for some users make sense.
> >>
> >> Based on my own profile I can tell that when I buy more of a certain
> >> product then I also like the product more.
> >>
> >> I am also thinking about some seasonal tweaking. Tea is a very seasonal
> >> product during winter and christmas other flavors are sold then in
> summer.
> >>
> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
> >>
> >>>
> >>> It is not therefore surprising that log-likelihood works well, since it
> >>> ignores this value actually.
> >>>
> >>> (You mentioned RMSE but your evaluation metric is
> >>> average-absolute-difference -- L1, not L2).
> >>
> >> You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA
> >> (mean-avagerage-error).
> >>
> >>>
> >>> This is quite a small data set so you should have no performance
> issues.
> >>> Your evaluations, which run over all users in the data set, are taking
> >> mere
> >>> seconds. I am sure you could get away with much less memory/processing
> if
> >>> you like.
> >>
> >> This is by far good enough. The more important part is the newsletter
> >> sending. I have to generate about 10.000 emails that makes more headache
> >> then the recommender.
> >>
> >> /Manuel
> >>
> >>>
> >>>
> >>> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
> >>> Manuel.Blechschmidt@gmx.de> wrote:
> >>>
> >>>> Hello Mahout Team, hello users,
> >>>> me and a friend are currently evaluating recommendation techniques for
> >>>> personalizing a newsletter for a company selling tea, spices and some
> >> other
> >>>> products. Mahout is such a great product which saves me hours of time
> >> and
> >>>> millions of money because I want to give something back I write this
> >> small
> >>>> case study to the mailing list.
> >>>>
> >>>> I am conducting an offline testing of which recommender is the most
> >>>> accurate one. Further I am interested in run time behavior like memory
> >>>> consumption and runtime.
> >>>>
> >>>> The data contains implicit feedback. The preferences of the user is
> the
> >>>> amount in gramm that he bought from a certain product (453 g ~ 1
> >> pound). If
> >>>> a certain product does not have this data it is replaced with 50. So
> >>>> basically I want mahout to predict how much of a certain product is a
> >> user
> >>>> buying next. This is also helpful for demand planing. I am currently
> not
> >>>> using any time data because I did not find a recommender which is
> using
> >>>> this data.
> >>>>
> >>>> Users: 12858
> >>>> Items: 5467
> >>>> 121304 preferences
> >>>> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85
> kg
> >> of
> >>>> a certain tea or spice)
> >>>> MinPreference: 50.0
> >>>>
> >>>> Here are the pure benchmarks for accuracy in RMSE. They change during
> >>>> every run of the evaluation (~15%):
> >>>>
> >>>> Evaluation of randomBased (baseline): 43045.380570443434
> >>>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
> >>>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
> >>>> (GenericItemBasedRecommender(model,
> PearsonCorrelationSimilarity(model))
> >>>> (Time: ~1s)  (Memory: 35MB)
> >>>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
> >>>> (GenericItemBasedRecommender(model,
> UncenteredCosineSimilarity(model)))
> >>>> (Time: ~1s)  (Memory: 32MB)
> >>>> Evaluation of ItemBase with log likelihood: 176.45243607278724
> >>>> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
> >>>> (Time: ~5s)  (Memory: 42MB)
> >>>> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
> >>>> PearsonCorrelationSimilarity(model), model),
> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
> >>>> Evaluation of UserBased 20 with Pearson Correlation:
> 1144.1905989614288
> >>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
> >>>> PearsonCorrelationSimilarity(model), model),
> >>>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
> >>>> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
> >>>> (Time: ~4s) (Memory: 604MB)
> >>>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model,
> 100,
> >>>> 0.3, 5)) (Time: ) (Memory: 691MB)
> >>>>
> >>>> These were measured with the following method:
> >>>>
> >>>> RecommenderEvaluator evaluator = new
> >>>> AverageAbsoluteDifferenceRecommenderEvaluator();
> >>>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
> >>>>      0.9, 1.0);
> >>>>
> >>>> Memory usage was about 50m with the item based case. Slope One and SVD
> >>>> base seams to use the most memory (615MB & 691MB).
> >>>>
> >>>> The performance differs a lot. The fastest ones where the item based.
> >> They
> >>>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
> >>>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
> >>>> The user based where a lot slower.
> >>>>
> >>>> Conclusion is that in my case the item based approach is the fastest,
> >>>> lowest memory consumption and most accurate one. Further I can use the
> >>>> recommendedBecause function.
> >>>>
> >>>> Here is the spec of the computer:
> >>>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
> >>>>
> >>>> In the next step, probably in the next 2 month. I have to design a
> >>>> newsletter and send it to the customers. Then I can benchmark the user
> >>>> acceptance rate of the recommendations.
> >>>>
> >>>> Any suggestions for enhancements are appreciated. If anybody is
> >> interested
> >>>> in the dataset or the evaluation code send me a private email. I might
> >> be
> >>>> able to convince the company to give out the dataset if the person is
> >> doing
> >>>> some interesting research.
> >>>>
> >>>> /Manuel
> >>>> --
> >>>> Manuel Blechschmidt
> >>>> Dortustr. 57
> >>>> 14467 Potsdam
> >>>> Mobil: 0173/6322621
> >>>> Twitter: http://twitter.com/Manuel_B
> >>>>
> >>>>
> >>
> >> --
> >> Manuel Blechschmidt
> >> Dortustr. 57
> >> 14467 Potsdam
> >> Mobil: 0173/6322621
> >> Twitter: http://twitter.com/Manuel_B
> >>
> >>
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Thanks for the answer Ted.

On 21.11.2011, at 16:20, Ted Dunning wrote:

> Your product is subject to seasonality constraints (which teas are likely
> right now) and repeat buying.  I would separate out the recommendation of
> repeat buys from the separation of new items.

Actually I want to generate an email with diverse recommendations.

Something like:

Your personal top sellers:
.. 3 items ...

Special Winter Sales:
... 3 items ...

This might be interesting for you:
... 6 items ...

This is new in our store:
... 3 items ...

> 
> You may also find that item-item links on your web site are helpful.  These
> are easy to get using this system.

Yes, actually the website is already using some very basic item-to-item recommendations. So I am more interested in the newsletter part especially because I can track which items are really attractive and which aren't.

/Manuel

> 
> On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de> wrote:
> 
>> Hello Sean,
>> 
>> On 21.11.2011, at 12:16, Sean Owen wrote:
>> 
>>> Yes, because you have fewer items, an item-item-similarity-based
>> algorithm
>>> probably runs much faster.
>> 
>> Thanks for your blazing fast feedback.
>> 
>>> 
>>> I would not necessarily use the raw number of kg as a preference. It's
>> not
>>> really true that someone who buys 10kg of an item likes it 10x more than
>>> one he buys 1kg of. Maybe the second spice is much more valuable? I would
>>> at least try taking the logarithm of the weight, but, I think this is
>> very
>>> noisy as a proxy for "preference". It creates illogical leaps -- because
>>> one user bought 85kg of X, and Y is "similar" to X, this would conclude
>>> that you're somewhat likely to buy 85kg of Y too. I would probably not
>> use
>>> weight at all this way.
>> 
>> Thanks for this suggestions. I will consider to integrate a logarithmic
>> weight into the recommender. At the moment I am more concerned to get the
>> user feedback component working. From some manual tests I can already tell
>> that the recommendation for some users make sense.
>> 
>> Based on my own profile I can tell that when I buy more of a certain
>> product then I also like the product more.
>> 
>> I am also thinking about some seasonal tweaking. Tea is a very seasonal
>> product during winter and christmas other flavors are sold then in summer.
>> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
>> 
>>> 
>>> It is not therefore surprising that log-likelihood works well, since it
>>> ignores this value actually.
>>> 
>>> (You mentioned RMSE but your evaluation metric is
>>> average-absolute-difference -- L1, not L2).
>> 
>> You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA
>> (mean-avagerage-error).
>> 
>>> 
>>> This is quite a small data set so you should have no performance issues.
>>> Your evaluations, which run over all users in the data set, are taking
>> mere
>>> seconds. I am sure you could get away with much less memory/processing if
>>> you like.
>> 
>> This is by far good enough. The more important part is the newsletter
>> sending. I have to generate about 10.000 emails that makes more headache
>> then the recommender.
>> 
>> /Manuel
>> 
>>> 
>>> 
>>> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
>>> Manuel.Blechschmidt@gmx.de> wrote:
>>> 
>>>> Hello Mahout Team, hello users,
>>>> me and a friend are currently evaluating recommendation techniques for
>>>> personalizing a newsletter for a company selling tea, spices and some
>> other
>>>> products. Mahout is such a great product which saves me hours of time
>> and
>>>> millions of money because I want to give something back I write this
>> small
>>>> case study to the mailing list.
>>>> 
>>>> I am conducting an offline testing of which recommender is the most
>>>> accurate one. Further I am interested in run time behavior like memory
>>>> consumption and runtime.
>>>> 
>>>> The data contains implicit feedback. The preferences of the user is the
>>>> amount in gramm that he bought from a certain product (453 g ~ 1
>> pound). If
>>>> a certain product does not have this data it is replaced with 50. So
>>>> basically I want mahout to predict how much of a certain product is a
>> user
>>>> buying next. This is also helpful for demand planing. I am currently not
>>>> using any time data because I did not find a recommender which is using
>>>> this data.
>>>> 
>>>> Users: 12858
>>>> Items: 5467
>>>> 121304 preferences
>>>> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg
>> of
>>>> a certain tea or spice)
>>>> MinPreference: 50.0
>>>> 
>>>> Here are the pure benchmarks for accuracy in RMSE. They change during
>>>> every run of the evaluation (~15%):
>>>> 
>>>> Evaluation of randomBased (baseline): 43045.380570443434
>>>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
>>>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
>>>> (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model))
>>>> (Time: ~1s)  (Memory: 35MB)
>>>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
>>>> (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model)))
>>>> (Time: ~1s)  (Memory: 32MB)
>>>> Evaluation of ItemBase with log likelihood: 176.45243607278724
>>>> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
>>>> (Time: ~5s)  (Memory: 42MB)
>>>> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
>>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
>>>> PearsonCorrelationSimilarity(model), model),
>>>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
>>>> Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288
>>>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
>>>> PearsonCorrelationSimilarity(model), model),
>>>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
>>>> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
>>>> (Time: ~4s) (Memory: 604MB)
>>>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100,
>>>> 0.3, 5)) (Time: ) (Memory: 691MB)
>>>> 
>>>> These were measured with the following method:
>>>> 
>>>> RecommenderEvaluator evaluator = new
>>>> AverageAbsoluteDifferenceRecommenderEvaluator();
>>>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
>>>>      0.9, 1.0);
>>>> 
>>>> Memory usage was about 50m with the item based case. Slope One and SVD
>>>> base seams to use the most memory (615MB & 691MB).
>>>> 
>>>> The performance differs a lot. The fastest ones where the item based.
>> They
>>>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
>>>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
>>>> The user based where a lot slower.
>>>> 
>>>> Conclusion is that in my case the item based approach is the fastest,
>>>> lowest memory consumption and most accurate one. Further I can use the
>>>> recommendedBecause function.
>>>> 
>>>> Here is the spec of the computer:
>>>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
>>>> 
>>>> In the next step, probably in the next 2 month. I have to design a
>>>> newsletter and send it to the customers. Then I can benchmark the user
>>>> acceptance rate of the recommendations.
>>>> 
>>>> Any suggestions for enhancements are appreciated. If anybody is
>> interested
>>>> in the dataset or the evaluation code send me a private email. I might
>> be
>>>> able to convince the company to give out the dataset if the person is
>> doing
>>>> some interesting research.
>>>> 
>>>> /Manuel
>>>> --
>>>> Manuel Blechschmidt
>>>> Dortustr. 57
>>>> 14467 Potsdam
>>>> Mobil: 0173/6322621
>>>> Twitter: http://twitter.com/Manuel_B
>>>> 
>>>> 
>> 
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>> 
>> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Posted by Ted Dunning <te...@gmail.com>.

Your product is subject to seasonality constraints (which teas are likely
right now) and repeat buying.  I would separate out the recommendation of
repeat buys from the separation of new items.

You may also find that item-item links on your web site are helpful.  These
are easy to get using this system.

On Mon, Nov 21, 2011 at 11:46 AM, Manuel Blechschmidt <
Manuel.Blechschmidt@gmx.de> wrote:

> Hello Sean,
>
> On 21.11.2011, at 12:16, Sean Owen wrote:
>
> > Yes, because you have fewer items, an item-item-similarity-based
> algorithm
> > probably runs much faster.
>
> Thanks for your blazing fast feedback.
>
> >
> > I would not necessarily use the raw number of kg as a preference. It's
> not
> > really true that someone who buys 10kg of an item likes it 10x more than
> > one he buys 1kg of. Maybe the second spice is much more valuable? I would
> > at least try taking the logarithm of the weight, but, I think this is
> very
> > noisy as a proxy for "preference". It creates illogical leaps -- because
> > one user bought 85kg of X, and Y is "similar" to X, this would conclude
> > that you're somewhat likely to buy 85kg of Y too. I would probably not
> use
> > weight at all this way.
>
> Thanks for this suggestions. I will consider to integrate a logarithmic
> weight into the recommender. At the moment I am more concerned to get the
> user feedback component working. From some manual tests I can already tell
> that the recommendation for some users make sense.
>
> Based on my own profile I can tell that when I buy more of a certain
> product then I also like the product more.
>
> I am also thinking about some seasonal tweaking. Tea is a very seasonal
> product during winter and christmas other flavors are sold then in summer.
> http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf
>
> >
> > It is not therefore surprising that log-likelihood works well, since it
> > ignores this value actually.
> >
> > (You mentioned RMSE but your evaluation metric is
> > average-absolute-difference -- L1, not L2).
>
> You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA
> (mean-avagerage-error).
>
> >
> > This is quite a small data set so you should have no performance issues.
> > Your evaluations, which run over all users in the data set, are taking
> mere
> > seconds. I am sure you could get away with much less memory/processing if
> > you like.
>
> This is by far good enough. The more important part is the newsletter
> sending. I have to generate about 10.000 emails that makes more headache
> then the recommender.
>
> /Manuel
>
> >
> >
> > On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
> > Manuel.Blechschmidt@gmx.de> wrote:
> >
> >> Hello Mahout Team, hello users,
> >> me and a friend are currently evaluating recommendation techniques for
> >> personalizing a newsletter for a company selling tea, spices and some
> other
> >> products. Mahout is such a great product which saves me hours of time
> and
> >> millions of money because I want to give something back I write this
> small
> >> case study to the mailing list.
> >>
> >> I am conducting an offline testing of which recommender is the most
> >> accurate one. Further I am interested in run time behavior like memory
> >> consumption and runtime.
> >>
> >> The data contains implicit feedback. The preferences of the user is the
> >> amount in gramm that he bought from a certain product (453 g ~ 1
> pound). If
> >> a certain product does not have this data it is replaced with 50. So
> >> basically I want mahout to predict how much of a certain product is a
> user
> >> buying next. This is also helpful for demand planing. I am currently not
> >> using any time data because I did not find a recommender which is using
> >> this data.
> >>
> >> Users: 12858
> >> Items: 5467
> >> 121304 preferences
> >> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg
> of
> >> a certain tea or spice)
> >> MinPreference: 50.0
> >>
> >> Here are the pure benchmarks for accuracy in RMSE. They change during
> >> every run of the evaluation (~15%):
> >>
> >> Evaluation of randomBased (baseline): 43045.380570443434
> >> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
> >> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
> >> (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model))
> >> (Time: ~1s)  (Memory: 35MB)
> >> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
> >> (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model)))
> >> (Time: ~1s)  (Memory: 32MB)
> >> Evaluation of ItemBase with log likelihood: 176.45243607278724
> >> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
> >> (Time: ~5s)  (Memory: 42MB)
> >> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
> >> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
> >> PearsonCorrelationSimilarity(model), model),
> >> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
> >> Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288
> >> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
> >> PearsonCorrelationSimilarity(model), model),
> >> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
> >> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
> >> (Time: ~4s) (Memory: 604MB)
> >> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100,
> >> 0.3, 5)) (Time: ) (Memory: 691MB)
> >>
> >> These were measured with the following method:
> >>
> >> RecommenderEvaluator evaluator = new
> >> AverageAbsoluteDifferenceRecommenderEvaluator();
> >> double evaluation = evaluator.evaluate(randomBased, null, myModel,
> >>       0.9, 1.0);
> >>
> >> Memory usage was about 50m with the item based case. Slope One and SVD
> >> base seams to use the most memory (615MB & 691MB).
> >>
> >> The performance differs a lot. The fastest ones where the item based.
> They
> >> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
> >> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
> >> The user based where a lot slower.
> >>
> >> Conclusion is that in my case the item based approach is the fastest,
> >> lowest memory consumption and most accurate one. Further I can use the
> >> recommendedBecause function.
> >>
> >> Here is the spec of the computer:
> >> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
> >>
> >> In the next step, probably in the next 2 month. I have to design a
> >> newsletter and send it to the customers. Then I can benchmark the user
> >> acceptance rate of the recommendations.
> >>
> >> Any suggestions for enhancements are appreciated. If anybody is
> interested
> >> in the dataset or the evaluation code send me a private email. I might
> be
> >> able to convince the company to give out the dataset if the person is
> doing
> >> some interesting research.
> >>
> >> /Manuel
> >> --
> >> Manuel Blechschmidt
> >> Dortustr. 57
> >> 14467 Potsdam
> >> Mobil: 0173/6322621
> >> Twitter: http://twitter.com/Manuel_B
> >>
> >>
>
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Posted by Manuel Blechschmidt <Ma...@gmx.de>.

Hello Sean,

On 21.11.2011, at 12:16, Sean Owen wrote:

> Yes, because you have fewer items, an item-item-similarity-based algorithm
> probably runs much faster.

Thanks for your blazing fast feedback.

> 
> I would not necessarily use the raw number of kg as a preference. It's not
> really true that someone who buys 10kg of an item likes it 10x more than
> one he buys 1kg of. Maybe the second spice is much more valuable? I would
> at least try taking the logarithm of the weight, but, I think this is very
> noisy as a proxy for "preference". It creates illogical leaps -- because
> one user bought 85kg of X, and Y is "similar" to X, this would conclude
> that you're somewhat likely to buy 85kg of Y too. I would probably not use
> weight at all this way.

Thanks for this suggestions. I will consider to integrate a logarithmic weight into the recommender. At the moment I am more concerned to get the user feedback component working. From some manual tests I can already tell that the recommendation for some users make sense.

Based on my own profile I can tell that when I buy more of a certain product then I also like the product more.

I am also thinking about some seasonal tweaking. Tea is a very seasonal product during winter and christmas other flavors are sold then in summer. http://diuf.unifr.ch/main/is/sites/diuf.unifr.ch.main.is/files/documents/publications/WS07-08-011.pdf

> 
> It is not therefore surprising that log-likelihood works well, since it
> ignores this value actually.
> 
> (You mentioned RMSE but your evaluation metric is
> average-absolute-difference -- L1, not L2).

You are right RMSE (root-mean-squared-error) is wrong. I think it is MEA (mean-avagerage-error).

> 
> This is quite a small data set so you should have no performance issues.
> Your evaluations, which run over all users in the data set, are taking mere
> seconds. I am sure you could get away with much less memory/processing if
> you like.

This is by far good enough. The more important part is the newsletter sending. I have to generate about 10.000 emails that makes more headache then the recommender.

/Manuel

> 
> 
> On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
> Manuel.Blechschmidt@gmx.de> wrote:
> 
>> Hello Mahout Team, hello users,
>> me and a friend are currently evaluating recommendation techniques for
>> personalizing a newsletter for a company selling tea, spices and some other
>> products. Mahout is such a great product which saves me hours of time and
>> millions of money because I want to give something back I write this small
>> case study to the mailing list.
>> 
>> I am conducting an offline testing of which recommender is the most
>> accurate one. Further I am interested in run time behavior like memory
>> consumption and runtime.
>> 
>> The data contains implicit feedback. The preferences of the user is the
>> amount in gramm that he bought from a certain product (453 g ~ 1 pound). If
>> a certain product does not have this data it is replaced with 50. So
>> basically I want mahout to predict how much of a certain product is a user
>> buying next. This is also helpful for demand planing. I am currently not
>> using any time data because I did not find a recommender which is using
>> this data.
>> 
>> Users: 12858
>> Items: 5467
>> 121304 preferences
>> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg of
>> a certain tea or spice)
>> MinPreference: 50.0
>> 
>> Here are the pure benchmarks for accuracy in RMSE. They change during
>> every run of the evaluation (~15%):
>> 
>> Evaluation of randomBased (baseline): 43045.380570443434
>> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
>> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
>> (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model))
>> (Time: ~1s)  (Memory: 35MB)
>> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
>> (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model)))
>> (Time: ~1s)  (Memory: 32MB)
>> Evaluation of ItemBase with log likelihood: 176.45243607278724
>> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
>> (Time: ~5s)  (Memory: 42MB)
>> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
>> PearsonCorrelationSimilarity(model), model),
>> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
>> Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288
>> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
>> PearsonCorrelationSimilarity(model), model),
>> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
>> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
>> (Time: ~4s) (Memory: 604MB)
>> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100,
>> 0.3, 5)) (Time: ) (Memory: 691MB)
>> 
>> These were measured with the following method:
>> 
>> RecommenderEvaluator evaluator = new
>> AverageAbsoluteDifferenceRecommenderEvaluator();
>> double evaluation = evaluator.evaluate(randomBased, null, myModel,
>>       0.9, 1.0);
>> 
>> Memory usage was about 50m with the item based case. Slope One and SVD
>> base seams to use the most memory (615MB & 691MB).
>> 
>> The performance differs a lot. The fastest ones where the item based. They
>> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
>> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
>> The user based where a lot slower.
>> 
>> Conclusion is that in my case the item based approach is the fastest,
>> lowest memory consumption and most accurate one. Further I can use the
>> recommendedBecause function.
>> 
>> Here is the spec of the computer:
>> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
>> 
>> In the next step, probably in the next 2 month. I have to design a
>> newsletter and send it to the customers. Then I can benchmark the user
>> acceptance rate of the recommendations.
>> 
>> Any suggestions for enhancements are appreciated. If anybody is interested
>> in the dataset or the evaluation code send me a private email. I might be
>> able to convince the company to give out the dataset if the person is doing
>> some interesting research.
>> 
>> /Manuel
>> --
>> Manuel Blechschmidt
>> Dortustr. 57
>> 14467 Potsdam
>> Mobil: 0173/6322621
>> Twitter: http://twitter.com/Manuel_B
>> 
>> 

-- 
Manuel Blechschmidt
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Evaluation of different recommendation algorithms for 12.000 user data set

Posted by Sean Owen <sr...@gmail.com>.

Yes, because you have fewer items, an item-item-similarity-based algorithm
probably runs much faster.

I would not necessarily use the raw number of kg as a preference. It's not
really true that someone who buys 10kg of an item likes it 10x more than
one he buys 1kg of. Maybe the second spice is much more valuable? I would
at least try taking the logarithm of the weight, but, I think this is very
noisy as a proxy for "preference". It creates illogical leaps -- because
one user bought 85kg of X, and Y is "similar" to X, this would conclude
that you're somewhat likely to buy 85kg of Y too. I would probably not use
weight at all this way.

It is not therefore surprising that log-likelihood works well, since it
ignores this value actually.

(You mentioned RMSE but your evaluation metric is
average-absolute-difference -- L1, not L2).

This is quite a small data set so you should have no performance issues.
Your evaluations, which run over all users in the data set, are taking mere
seconds. I am sure you could get away with much less memory/processing if
you like.


On Mon, Nov 21, 2011 at 11:06 AM, Manuel Blechschmidt <
Manuel.Blechschmidt@gmx.de> wrote:

> Hello Mahout Team, hello users,
> me and a friend are currently evaluating recommendation techniques for
> personalizing a newsletter for a company selling tea, spices and some other
> products. Mahout is such a great product which saves me hours of time and
> millions of money because I want to give something back I write this small
> case study to the mailing list.
>
> I am conducting an offline testing of which recommender is the most
> accurate one. Further I am interested in run time behavior like memory
> consumption and runtime.
>
> The data contains implicit feedback. The preferences of the user is the
> amount in gramm that he bought from a certain product (453 g ~ 1 pound). If
> a certain product does not have this data it is replaced with 50. So
> basically I want mahout to predict how much of a certain product is a user
> buying next. This is also helpful for demand planing. I am currently not
> using any time data because I did not find a recommender which is using
> this data.
>
> Users: 12858
> Items: 5467
> 121304 preferences
> MaxPreference: 85850.0 (Meaning that there is someone who ordered 85 kg of
> a certain tea or spice)
> MinPreference: 50.0
>
> Here are the pure benchmarks for accuracy in RMSE. They change during
> every run of the evaluation (~15%):
>
> Evaluation of randomBased (baseline): 43045.380570443434
> (RandomRecommender(model)) (Time: ~0.3 s) (Memory: 16MB)
> Evaluation of ItemBased with Pearson Correlation: 315.5804958647985
> (GenericItemBasedRecommender(model, PearsonCorrelationSimilarity(model))
> (Time: ~1s)  (Memory: 35MB)
> Evaluation of ItemBase with uncentered Cosine: 198.25393235323375
> (GenericItemBasedRecommender(model, UncenteredCosineSimilarity(model)))
> (Time: ~1s)  (Memory: 32MB)
> Evaluation of ItemBase with log likelihood: 176.45243607278724
> (GenericItemBasedRecommender(model, LogLikelihoodSimilarity(model)))
>  (Time: ~5s)  (Memory: 42MB)
> Evaluation of UserBased 3 with Pearson Correlation: 1378.1188069379868
> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(3,
> PearsonCorrelationSimilarity(model), model),
> PearsonCorrelationSimilarity(model)))  (Time: ~52s) (Memory: 57MB)
> Evaluation of UserBased 20 with Pearson Correlation: 1144.1905989614288
> (GenericUserBasedRecommender(model, NearestNUserNeighborhood(20,
> PearsonCorrelationSimilarity(model), model),
> PearsonCorrelationSimilarity(model)))  (Time: ~51s) (Memory: 57MB)
> Evaluation of SlopeOne: 464.8989330869532 (SlopeOneRecommender(model))
> (Time: ~4s) (Memory: 604MB)
> Evaluation of SVD based: 326.1050823499026 (ALSWRFactorizer(model, 100,
> 0.3, 5)) (Time: ) (Memory: 691MB)
>
> These were measured with the following method:
>
> RecommenderEvaluator evaluator = new
> AverageAbsoluteDifferenceRecommenderEvaluator();
> double evaluation = evaluator.evaluate(randomBased, null, myModel,
>        0.9, 1.0);
>
> Memory usage was about 50m with the item based case. Slope One and SVD
> base seams to use the most memory (615MB & 691MB).
>
> The performance differs a lot. The fastest ones where the item based. They
> took about 1 to 5 seconds (PearsonCorrelationSimilarity and
> UncenteredCosineSimilarity 1 s, LogLikelihoodSimilarity 5s)
> The user based where a lot slower.
>
> Conclusion is that in my case the item based approach is the fastest,
> lowest memory consumption and most accurate one. Further I can use the
> recommendedBecause function.
>
> Here is the spec of the computer:
> 2.3GHz Intel Core i5 (4 Cores). 1024 MB for java virtual machine.
>
> In the next step, probably in the next 2 month. I have to design a
> newsletter and send it to the customers. Then I can benchmark the user
> acceptance rate of the recommendations.
>
> Any suggestions for enhancements are appreciated. If anybody is interested
> in the dataset or the evaluation code send me a private email. I might be
> able to convince the company to give out the dataset if the person is doing
> some interesting research.
>
> /Manuel
> --
> Manuel Blechschmidt
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>
>