You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by lee carroll <le...@googlemail.com> on 2011/10/25 19:50:26 UTC

Average Absolute Difference Recommender Evaluator metric

What does the metric returned by
AverageAbsoluteDifferenceRecommenderEvaluator mean for non rating
based recommenders.

The Mahout in action book describes the metric as being the amount a
prediction would differ from the actual rating. (Lower the better)
But what does that mean in terms of a recommender which uses a
similarity measure which does not use rating data, such as jaccard
or for that matter measures which use rank.

Example:
Say we get a 1.2 AAD for a recommender using Euclidean distance.
Ratings range from 1 to 10 so i'm thinking this is pretty good, we are
out by a little over 1. We will make the mistake of
thinking around 6 or 8 when its the actual preference is a seven.

But

What does a 1.3 AAD for a Tanimoto using recommender mean? and can I
compare it with other recommender AAD's? (I'm sure you can, as the
excellent mahout book does :-)

What am I missing? do I have a to simplistic view of the metric of AAD?

Thanks in advance Lee C

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Sean Owen <sr...@gmail.com>.

The term is maybe more my own invention than anything, but I think
it's an accurate label for contexts where you just have an
association, or you don't -- no rating. In this case, recommenders
still work but I don't think they can be said to be estimating
ratings. What you get from estimatePreference() for these
implementations is useful for ranking but isn't really an estimated
pref.

On Tue, Oct 25, 2011 at 8:55 PM, lee carroll
<le...@googlemail.com> wrote:
> I've not come across the terms boolean / non boolean recommenders
> before. I thought they all worked by
> estimating preferences.
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Ted Dunning <te...@gmail.com>.

I don't think that AAD is a good way to compare the recommendations.

A good way is to think about the application.  In that application, you are
likely to show about a page of recommendations.  The only question that is
important is whether or not users find that page useful.

One useful surrogate to answer that question is AUC.  Another is precision
@20.  They key to both of these is that it gets at the heart of the question
of whether or not the recommendations are ordered correctly.

On Tue, Oct 25, 2011 at 4:12 PM, lee carroll
<le...@googlemail.com>wrote:

> >No, you're welcome to make comparisons in these tables. It's valid.
>
> Okay I think I'm back at square one.
> So we have an AAD using an Euclidean similarity measure of 1.2 This is
> calculated for ratings in the range of 1 through to 10.
> For the same data we also have a Tanimoto AAD of 1.3
>
> Now imagine the ratings are now in the range of 1 through to 20 but
> all the users rate in exactly the same way (rating value)*2
> We would now have for the Euclidean driven recommender an AAD of 2.4
> but the tanimoto would still be 1.3
>
> How can we use AAD to compare the two recommenders ?
>
> A bit of background just to explain why I'm labouring this point (and
> I'm well aware that I'm labouring it)
> By being able to describe AAD as "the amount a prediction would differ
> from the actual rating. (Lower the better)"
> to a business stake holder makes the evaluation of the recommender
> vivid and concrete. The confidence this
> creates is not to be under-estimated. However how do I describe to a
> business stake holder the meaning of a tanimoto produced
> AAD? I can't at the moment :-)
>
> cheers Lee C
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by lee carroll <le...@googlemail.com>.

>No, you're welcome to make comparisons in these tables. It's valid.

Okay I think I'm back at square one.
So we have an AAD using an Euclidean similarity measure of 1.2 This is
calculated for ratings in the range of 1 through to 10.
For the same data we also have a Tanimoto AAD of 1.3

Now imagine the ratings are now in the range of 1 through to 20 but
all the users rate in exactly the same way (rating value)*2
We would now have for the Euclidean driven recommender an AAD of 2.4
but the tanimoto would still be 1.3

How can we use AAD to compare the two recommenders ?

A bit of background just to explain why I'm labouring this point (and
I'm well aware that I'm labouring it)
By being able to describe AAD as "the amount a prediction would differ
from the actual rating. (Lower the better)"
to a business stake holder makes the evaluation of the recommender
vivid and concrete. The confidence this
creates is not to be under-estimated. However how do I describe to a
business stake holder the meaning of a tanimoto produced
AAD? I can't at the moment :-)

cheers Lee C

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Sean Owen <sr...@gmail.com>.

(This is a little different -- the estimate isn't in [0,1], it is in
[1,1]! The commentary is right, in the abstract.)

On Tue, Oct 25, 2011 at 9:26 PM, Ted Dunning <te...@gmail.com> wrote:
> Speaking statistically, AAD has some interesting issues when you are trying
> to estimate a boolean value.
>
> In this framework, you produce an estimate in [0,1] of an actual value that
> is in {0,1}.  If you guess binary results, you will have an error of 0 or 1.
>  If you guess intermediate values, you are guaranteed to have a non-zero
> error.  If the actual probability of getting a 1 is p and you estimate x,
> then the expected value of AAD is (1-p) x + p (1-x) =
> x-px + p-px = p + x (1-2p).  If p<0.5, this is minimized by setting x = 0
> and if p>0.5, by setting x = 1.  Thus, you minimize AAD by only guessing
> binary values (which our recommenders never do, btw).
>
> As such, if you really want to use AAD as a quality metric, you may want to
> put in a step that clamps the output to 0 or 1 before evaluating.

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Ted Dunning <te...@gmail.com>.

Speaking statistically, AAD has some interesting issues when you are trying
to estimate a boolean value.

In this framework, you produce an estimate in [0,1] of an actual value that
is in {0,1}.  If you guess binary results, you will have an error of 0 or 1.
 If you guess intermediate values, you are guaranteed to have a non-zero
error.  If the actual probability of getting a 1 is p and you estimate x,
then the expected value of AAD is (1-p) x + p (1-x) =
x-px + p-px = p + x (1-2p).  If p<0.5, this is minimized by setting x = 0
and if p>0.5, by setting x = 1.  Thus, you minimize AAD by only guessing
binary values (which our recommenders never do, btw).

As such, if you really want to use AAD as a quality metric, you may want to
put in a step that clamps the output to 0 or 1 before evaluating.

On Tue, Oct 25, 2011 at 1:05 PM, Sean Owen <sr...@gmail.com> wrote:

> Well it is also a property of the recommender. If you throw a "normal"
> implementation at your data it will happily estimate, correctly, that
> all unknown ratings are 1. it's these other variants that do something
> different and meaningful.
>
> The reverse is fine -- you can use similarity metrics that don't
> assume ratings on data that does have ratings.
>
> No, you're welcome to make comparisons in these tables. It's valid.
>
> On Tue, Oct 25, 2011 at 9:02 PM, lee carroll
> <le...@googlemail.com> wrote:
> > Ah you did not say boolean / non boolean recommenders, you talking
> > about boolean preference ratings.
> >
> > Ok I think I have it.
> >
> > I'm up to chapter 5 in the mahout in action book (so please bare with
> me:-)
> > So is it fair to say table 5.1 and 5.2 should avoid the comparissons
> > between the top two ?
> >
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Sean Owen <sr...@gmail.com>.

Well it is also a property of the recommender. If you throw a "normal"
implementation at your data it will happily estimate, correctly, that
all unknown ratings are 1. it's these other variants that do something
different and meaningful.

The reverse is fine -- you can use similarity metrics that don't
assume ratings on data that does have ratings.

No, you're welcome to make comparisons in these tables. It's valid.

On Tue, Oct 25, 2011 at 9:02 PM, lee carroll
<le...@googlemail.com> wrote:
> Ah you did not say boolean / non boolean recommenders, you talking
> about boolean preference ratings.
>
> Ok I think I have it.
>
> I'm up to chapter 5 in the mahout in action book (so please bare with me:-)
> So is it fair to say table 5.1 and 5.2 should avoid the comparissons
> between the top two ?
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by lee carroll <le...@googlemail.com>.

Ah you did not say boolean / non boolean recommenders, you talking
about boolean preference ratings.

Ok I think I have it.

I'm up to chapter 5 in the mahout in action book (so please bare with me:-)
So is it fair to say table 5.1 and 5.2 should avoid the comparissons
between the top two ?



On 25 October 2011 20:55, lee carroll <le...@googlemail.com> wrote:
> I've not come across the terms boolean / non boolean recommenders
> before. I thought they all worked by
> estimating preferences.
>
>
>
> On 25 October 2011 19:13, Sean Owen <sr...@gmail.com> wrote:
>> You should be able to compare across all of the "non-boolean"
>> recommenders as they all operate by estimating preferences.
>>
>> But it's not meaningful for any comparison, for the rest.
>>
>> On Tue, Oct 25, 2011 at 7:04 PM, lee carroll
>> <le...@googlemail.com> wrote:
>>> So when comparing within a technique AAD or RMS is fine but when comparing
>>> across recommenders using a variety of similarities its best to stick
>>> to IR measures.
>>
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by lee carroll <le...@googlemail.com>.

I've not come across the terms boolean / non boolean recommenders
before. I thought they all worked by
estimating preferences.



On 25 October 2011 19:13, Sean Owen <sr...@gmail.com> wrote:
> You should be able to compare across all of the "non-boolean"
> recommenders as they all operate by estimating preferences.
>
> But it's not meaningful for any comparison, for the rest.
>
> On Tue, Oct 25, 2011 at 7:04 PM, lee carroll
> <le...@googlemail.com> wrote:
>> So when comparing within a technique AAD or RMS is fine but when comparing
>> across recommenders using a variety of similarities its best to stick
>> to IR measures.
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Sean Owen <sr...@gmail.com>.

You should be able to compare across all of the "non-boolean"
recommenders as they all operate by estimating preferences.

But it's not meaningful for any comparison, for the rest.

On Tue, Oct 25, 2011 at 7:04 PM, lee carroll
<le...@googlemail.com> wrote:
> So when comparing within a technique AAD or RMS is fine but when comparing
> across recommenders using a variety of similarities its best to stick
> to IR measures.

Re: Average Absolute Difference Recommender Evaluator metric

Posted by lee carroll <le...@googlemail.com>.

So when comparing within a technique AAD or RMS is fine but when comparing
across recommenders using a variety of similarities its best to stick
to IR measures.



On 25 October 2011 18:52, Sean Owen <sr...@gmail.com> wrote:
> It's fairly meaningless, as there are no prefs in this case, so no
> such thing as estimated prefs to compare against real ones.
> The recommender does rank on a metric, but it's not estimate pref in
> this case. I imagine it will spit out a number but it's not going to
> be of much use.
>
> All you can really do here is use precision/recall tests.
>
> On Tue, Oct 25, 2011 at 6:50 PM, lee carroll
> <le...@googlemail.com> wrote:
>> What does the metric returned by
>> AverageAbsoluteDifferenceRecommenderEvaluator mean for non rating
>> based recommenders.
>>
>> The Mahout in action book describes the metric as being the amount a
>> prediction would differ from the actual rating. (Lower the better)
>> But what does that mean in terms of a recommender which uses a
>> similarity measure which does not use rating data, such as jaccard
>> or for that matter measures which use rank.
>>
>> Example:
>> Say we get a 1.2 AAD for a recommender using Euclidean distance.
>> Ratings range from 1 to 10 so i'm thinking this is pretty good, we are
>> out by a little over 1. We will make the mistake of
>> thinking around 6 or 8 when its the actual preference is a seven.
>>
>> But
>>
>> What does a 1.3 AAD for a Tanimoto using recommender mean? and can I
>> compare it with other recommender AAD's? (I'm sure you can, as the
>> excellent mahout book does :-)
>>
>> What am I missing? do I have a to simplistic view of the metric of AAD?
>>
>> Thanks in advance Lee C
>>
>

Re: Average Absolute Difference Recommender Evaluator metric

Posted by Sean Owen <sr...@gmail.com>.

It's fairly meaningless, as there are no prefs in this case, so no
such thing as estimated prefs to compare against real ones.
The recommender does rank on a metric, but it's not estimate pref in
this case. I imagine it will spit out a number but it's not going to
be of much use.

All you can really do here is use precision/recall tests.

On Tue, Oct 25, 2011 at 6:50 PM, lee carroll
<le...@googlemail.com> wrote:
> What does the metric returned by
> AverageAbsoluteDifferenceRecommenderEvaluator mean for non rating
> based recommenders.
>
> The Mahout in action book describes the metric as being the amount a
> prediction would differ from the actual rating. (Lower the better)
> But what does that mean in terms of a recommender which uses a
> similarity measure which does not use rating data, such as jaccard
> or for that matter measures which use rank.
>
> Example:
> Say we get a 1.2 AAD for a recommender using Euclidean distance.
> Ratings range from 1 to 10 so i'm thinking this is pretty good, we are
> out by a little over 1. We will make the mistake of
> thinking around 6 or 8 when its the actual preference is a seven.
>
> But
>
> What does a 1.3 AAD for a Tanimoto using recommender mean? and can I
> compare it with other recommender AAD's? (I'm sure you can, as the
> excellent mahout book does :-)
>
> What am I missing? do I have a to simplistic view of the metric of AAD?
>
> Thanks in advance Lee C
>