You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2010/10/29 10:57:37 UTC

Order-based evalution of recommenders

I've written an evaluator of recommenders that compares the order of
recommendations, rather than the nominal preference values. I'm happy
with how well it works now. It is a variant of 'Wilcoxon ranking'.
Ranking is unforch N^2 for N recommendations. The "ranking" algorithm
is, frankly, baffling but it works.

http://comp9.psych.cornell.edu/Darlington/normscor.htm

My recommender project uses a different numerical space for
preferences and data models, so the existing AbsoluteValue evaluator
was useless. This order evaluator requires that two recommendation
sequences have the same items, but in different order. If two
sequences are almost but not quite the same, the "Sloppy Hamming" and
the Wilcoxon Ranking scores correlate well, so that means the Wilcoxon
Ranking score is tuned.

Sloppy Hamming: V1[N] must match any of V2[N-1], V2[N] or V2[N+1] to
create a 'true' at position N. There must be a name for this.

-- 
Lance Norskog
goksron@gmail.com

Re: Order-based evalution of recommenders

Posted by Ted Dunning <te...@gmail.com>.

AUC is the standard measure here.  Since it is essentially identical to
Wilcoxon's test or the Mann-Whitney test, there is a long pedigree.

Computing it from list order is pretty straightforward.  The
class org.apache.mahout.classifier.evaluation.AUC has an implementation that
could be adapted.

The idea is that AUC can be defined as the average probability that a random
true positive has a higher score (appears earlier) than a
random true negative example.  If you assume that all items that do not
appear in your reference set are negatives, then you can
iterate over the examples in your reference list averaging rank(reference) >
rank(test) ? 1 : 0.  If this value is 0.5, then your test case is
as good as the reference.  If > 0.5, then your test case is better (don't
think this is possible) and if <0.5 then the test case is worse.

If you have actual user feedback on the recommendations, say by recording
clicks, then you can directly average the degree to which the score for a
clicked item is above the score for a random unclicked item.  As this
measure approaches 1, you have a perfect
recommender.

On Sat, Oct 30, 2010 at 7:51 PM, Lance Norskog <go...@gmail.com> wrote:

> This order-comparing evaluator runs the algorithm against the two
> symbol sequences. So, if they both return the same results in a
> similar order, the algorithm evaluates the amount of difference. The
> smaller the better.
>
> If I have three "reference" recommenders that have small deltas
> against each other, and my recommender has a large delta against all
> of them, then my recommender is having problems.
>

Re: Order-based evalution of recommenders

Posted by Lance Norskog <go...@gmail.com>.

Yes. One algorithm is this:
Given two recommenders using the same data model, request the same
list of user-item prefs.
The two lists have to have the same symbols, just in a different order.

This order-comparing evaluator runs the algorithm against the two
symbol sequences. So, if they both return the same results in a
similar order, the algorithm evaluates the amount of difference. The
smaller the better.

If I have three "reference" recommenders that have small deltas
against each other, and my recommender has a large delta against all
of them, then my recommender is having problems.

On Fri, Oct 29, 2010 at 4:03 AM, Steven Bourke <sb...@gmail.com> wrote:
> When you say the order do you mean the ranking of the recommendations which
> are returned to the user?
>
> On Fri, Oct 29, 2010 at 9:57 AM, Lance Norskog <go...@gmail.com> wrote:
>
>> I've written an evaluator of recommenders that compares the order of
>> recommendations, rather than the nominal preference values. I'm happy
>> with how well it works now. It is a variant of 'Wilcoxon ranking'.
>> Ranking is unforch N^2 for N recommendations. The "ranking" algorithm
>> is, frankly, baffling but it works.
>>
>> http://comp9.psych.cornell.edu/Darlington/normscor.htm
>>
>> My recommender project uses a different numerical space for
>> preferences and data models, so the existing AbsoluteValue evaluator
>> was useless. This order evaluator requires that two recommendation
>> sequences have the same items, but in different order. If two
>> sequences are almost but not quite the same, the "Sloppy Hamming" and
>> the Wilcoxon Ranking scores correlate well, so that means the Wilcoxon
>> Ranking score is tuned.
>>
>> Sloppy Hamming: V1[N] must match any of V2[N-1], V2[N] or V2[N+1] to
>> create a 'true' at position N. There must be a name for this.
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
>

-- 
Lance Norskog
goksron@gmail.com

Re: Order-based evalution of recommenders

Posted by Steven Bourke <sb...@gmail.com>.

When you say the order do you mean the ranking of the recommendations which
are returned to the user?

On Fri, Oct 29, 2010 at 9:57 AM, Lance Norskog <go...@gmail.com> wrote:

> I've written an evaluator of recommenders that compares the order of
> recommendations, rather than the nominal preference values. I'm happy
> with how well it works now. It is a variant of 'Wilcoxon ranking'.
> Ranking is unforch N^2 for N recommendations. The "ranking" algorithm
> is, frankly, baffling but it works.
>
> http://comp9.psych.cornell.edu/Darlington/normscor.htm
>
> My recommender project uses a different numerical space for
> preferences and data models, so the existing AbsoluteValue evaluator
> was useless. This order evaluator requires that two recommendation
> sequences have the same items, but in different order. If two
> sequences are almost but not quite the same, the "Sloppy Hamming" and
> the Wilcoxon Ranking scores correlate well, so that means the Wilcoxon
> Ranking score is tuned.
>
> Sloppy Hamming: V1[N] must match any of V2[N-1], V2[N] or V2[N+1] to
> create a 'true' at position N. There must be a name for this.
>
> --
> Lance Norskog
> goksron@gmail.com
>