You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2012/10/27 10:09:45 UTC

Evaluation of Mahout recommenders

Did any of you go to this?

RUE 2012 – Workshop on Recommendation Utility Evaluation: Beyond RMSE 
http://ceur-ws.org/Vol-910/ 

One of the poster sessions was an evaluation of the Mahout recommender:

Case Study Evaluation of Mahout as a Recommender Platform
http://ceur-ws.org/Vol-910/paper10.pdf

Re: Evaluation of Mahout recommenders

Posted by Ted Dunning <te...@gmail.com>.

It is a nice writeup, but the Mahout comparison was a bit of a strawman.  I
wish I could go to their talk, but I was in office hours right then.
 Coincidentally, I was advising somebody that an excellent way to deploy a
recommendation system is on top of Solr.  As the regulars here will know, I
push this often after getting good results from such a system over 5 years
ago at Veoh.

Some observations on best practices before talking about the paper,

- it is always necessary to marry a recommendation system with some kind of
content system.  SolR makes that easy.

- using additional characteristics such as cuisine for restaurants, genre
for movies and other similar things is always a good thing to do as a
backup if you have sparse data.

- using something like LLR to find significant item-to-item relationships
and then pushing those relations into a SolR instance allows you to blend
geo-filtering, content recommendation and collaborative filtering into a
single framework.

- the native weighting schemes in SolR generally perform just fine for
recommendations with no ad hoc adjustments or term boosts or field boosts.
 If term or field boosts are desired, they should not be the first system
fielded and A/B testing should be used to test versions of weighting
algorithms after the first system is built.

- item-to-item relationships are excellent for structuring pages.  Often a
SolR instance can help render the site as well as do recommendations.

- cross-recommendations from terms to items should be extracted to improve
search accuracy


--- So ---

Relative to this baseline, how did this paper do?

+1 -- they used SolR, but assumed that using Mahout means that you won't
use SoLR.

+1 -- they used additional characteristics.

-1 -- I didn't read carefully enough to be absolutely sure, bit it appears
that they didn't do a good job of sparsifying item-item relationships, but
instead used a more naive system.

-1 -- they didn't make good use of SolR's native weighting and jumped
immediately to ad-hoc weightings.  They did use some A/B testing, but
apparently believed in their hunches too strongly.

-1 -- they seem to have missed the potential for site navigation

-1 -- they seem to have no idea about cross recommendation.

So I would give them about a 2 out of 6 points on doing as well as they
could.  On the other hand, they have a working system so they have the
moral high ground from the beginning.

One thing that would have helped them a lot would have been to solicit
reviews from this group before publishing.  That would have helped them
avoid the classic problem of comparison papers where the authors always
know and spend lots of time on the stuff they know intimately while they
generally spend little time and know less about the other alternative
approaches.


On Sat, Oct 27, 2012 at 6:19 AM, Sean Owen <sr...@gmail.com> wrote:

> I didn't go but yes it's a nice write-up:
>
>    - The code used is old (2 years / 0.4) but that should not be a
>    significant issue
>    -  I didn't understand the bit about having to change it to center data,
>    since the implementation already does (provides both centered and
>    uncentered versions)
>    - Centering didn't matter much on this data, and I wonder if that's
>    because in fact the "uncentered" implementation was actually centered? I
>    don't know
>    - Weighting is this made-up way to give more weight to item-item
>    similarity based on more data points, which patches up undesirable
> behavior
>    of something like Pearson here. It weights by (1-count/numItems), which
> is
>    not super principled; should have been based on standard deviation of
> the
>    series. I am not sure I agree with arbitrarily changing the weight to
>    "count/50" and capping to 1. Just seems even more arbitrary.
>    - Weighting helps out Pearson, yes indeed
>    - A conclusion of the paper was that they'd found tweaks to improve the
>    baseline performance... but the baseline performance is never shown
> here,
>    and the default is to center, and to use weighting. Since the modified
>    centering and weighting was the best approach in the graphs, I do wonder
>    whether the defaults would have done as well. I assume not (?) for this
>    data set. But probably should have been included.
>    - The overall analysis of a tradeoff between 'coverage' and accuracy is
>    a good and useful one
>    - The biggest problem identified with this neighborhood approach is
>    sparsity. Indeed you can't make predictions for a lot of items and
> that's
>    generally bad.
>
>
> On Sat, Oct 27, 2012 at 9:09 AM, Lance Norskog <go...@gmail.com> wrote:
> > Did any of you go to this?
> >
> > RUE 2012 – Workshop on Recommendation Utility Evaluation: Beyond RMSE
> > http://ceur-ws.org/Vol-910/
> >
> > One of the poster sessions was an evaluation of the Mahout recommender:
> >
> > Case Study Evaluation of Mahout as a Recommender Platform
> > http://ceur-ws.org/Vol-910/paper10.pdf
> >
>

Re: Evaluation of Mahout recommenders

Posted by Sean Owen <sr...@gmail.com>.

I didn't go but yes it's a nice write-up:

   - The code used is old (2 years / 0.4) but that should not be a
   significant issue
   -  I didn't understand the bit about having to change it to center data,
   since the implementation already does (provides both centered and
   uncentered versions)
   - Centering didn't matter much on this data, and I wonder if that's
   because in fact the "uncentered" implementation was actually centered? I
   don't know
   - Weighting is this made-up way to give more weight to item-item
   similarity based on more data points, which patches up undesirable behavior
   of something like Pearson here. It weights by (1-count/numItems), which is
   not super principled; should have been based on standard deviation of the
   series. I am not sure I agree with arbitrarily changing the weight to
   "count/50" and capping to 1. Just seems even more arbitrary.
   - Weighting helps out Pearson, yes indeed
   - A conclusion of the paper was that they'd found tweaks to improve the
   baseline performance... but the baseline performance is never shown here,
   and the default is to center, and to use weighting. Since the modified
   centering and weighting was the best approach in the graphs, I do wonder
   whether the defaults would have done as well. I assume not (?) for this
   data set. But probably should have been included.
   - The overall analysis of a tradeoff between 'coverage' and accuracy is
   a good and useful one
   - The biggest problem identified with this neighborhood approach is
   sparsity. Indeed you can't make predictions for a lot of items and that's
   generally bad.


On Sat, Oct 27, 2012 at 9:09 AM, Lance Norskog <go...@gmail.com> wrote:
> Did any of you go to this?
>
> RUE 2012 – Workshop on Recommendation Utility Evaluation: Beyond RMSE
> http://ceur-ws.org/Vol-910/
>
> One of the poster sessions was an evaluation of the Mahout recommender:
>
> Case Study Evaluation of Mahout as a Recommender Platform
> http://ceur-ws.org/Vol-910/paper10.pdf
>