You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Anatoliy Kats <a....@rambler-co.ru> on 2011/11/29 17:09:02 UTC

Evaluating recommendations with expired items

Hi,

I brought up this question in dev a few weeks ago.  I have a 
recommendation algorithm that learns the similarity matrix relying on 
both current items, and expired ones that should not be recommended.  
However, AverageAbsoluteDifferenceRecommenderEvaluator compares the 
predicted and actual ratings for all items, expired or not.  I believe 
the evaluation would be more realistic if it did not -- it corresponds 
more closely to how the algorithm is normally deployed in production.  
For example, the newer items generally have fewer clicks, so this kind 
of an evaluation emphasizes the cold start problem we would experience 
in production.

The evaluation uses expired items even if if I write a recommender class 
that forces all recommendations to use an IDRescorer that sets their 
scores to NaN.  The reason is that the ...Evaluator calls the 
Recommender::doEstimatePreference function to calculate the predicted 
rating, bypassing the recommend function.  I checked for the presence of 
expired items by running my recommender in the debugger, and checking 
the item IDs when doEstimatePreference is called.

Do I understand the evaluator's behavior correctly?  Do you think this 
is considered a bug?

Thanks,

Anatoliy

Re: Evaluating recommendations with expired items

Posted by Anatoliy Kats <a....@rambler-co.ru>.

Hi Sean,

OK, I understand, thanks.  I am working with Boolean data for the time 
being, so I'm using the IRStatsEvaluator.  But I'll revisit the issue if 
and when I go back to integer preferences.

On 11/29/2011 08:19 PM, Sean Owen wrote:
> The recommendation process ends with steps:
>
> 1. Estimate a pref for each candidate item
> 2. (Optionally, rescore or filter those pref values)
> 3. Sort by estimated pref and return top items by pref
>
> The evaluator is not evaluating the result at step #3, but at step #1 -- as
> a proxy for evaluating the quality of the ultimate recommendations. It's
> not necessarily any less valid to see how well it estimates the pref for an
> item that happens to be expired. So yes I'd say the current behavior is
> intended.
>
> I take your point though. You could fairly easily
> modify AbstractDifferenceRecommenderEvaluator to construct whatever test
> and training data set you like. For example, you would probably put all
> expired items in your training set and not in the test set.
>
> If you're OK just modifying the code, go for that.
> If you'd like to think of a clean way to incorporate a hook that lets you
> replace the random test/training selection with custom logic, that's cool
> too. I think it would be some work, if not a great deal, to cleanly
> refactor out the random sampling.
>
> On Tue, Nov 29, 2011 at 4:09 PM, Anatoliy Kats<a....@rambler-co.ru>  wrote:
>
>> Hi,
>>
>> I brought up this question in dev a few weeks ago.  I have a
>> recommendation algorithm that learns the similarity matrix relying on both
>> current items, and expired ones that should not be recommended.  However,
>> AverageAbsoluteDifferenceRecom**menderEvaluator compares the predicted
>> and actual ratings for all items, expired or not.  I believe the evaluation
>> would be more realistic if it did not -- it corresponds more closely to how
>> the algorithm is normally deployed in production.  For example, the newer
>> items generally have fewer clicks, so this kind of an evaluation emphasizes
>> the cold start problem we would experience in production.
>>
>> The evaluation uses expired items even if if I write a recommender class
>> that forces all recommendations to use an IDRescorer that sets their scores
>> to NaN.  The reason is that the ...Evaluator calls the Recommender::**doEstimatePreference
>> function to calculate the predicted rating, bypassing the recommend
>> function.  I checked for the presence of expired items by running my
>> recommender in the debugger, and checking the item IDs when
>> doEstimatePreference is called.
>>
>> Do I understand the evaluator's behavior correctly?  Do you think this is
>> considered a bug?
>>
>> Thanks,
>>
>> Anatoliy
>>

Re: Evaluating recommendations with expired items

Posted by Sean Owen <sr...@gmail.com>.

The recommendation process ends with steps:

1. Estimate a pref for each candidate item
2. (Optionally, rescore or filter those pref values)
3. Sort by estimated pref and return top items by pref

The evaluator is not evaluating the result at step #3, but at step #1 -- as
a proxy for evaluating the quality of the ultimate recommendations. It's
not necessarily any less valid to see how well it estimates the pref for an
item that happens to be expired. So yes I'd say the current behavior is
intended.

I take your point though. You could fairly easily
modify AbstractDifferenceRecommenderEvaluator to construct whatever test
and training data set you like. For example, you would probably put all
expired items in your training set and not in the test set.

If you're OK just modifying the code, go for that.
If you'd like to think of a clean way to incorporate a hook that lets you
replace the random test/training selection with custom logic, that's cool
too. I think it would be some work, if not a great deal, to cleanly
refactor out the random sampling.

On Tue, Nov 29, 2011 at 4:09 PM, Anatoliy Kats <a....@rambler-co.ru> wrote:

> Hi,
>
> I brought up this question in dev a few weeks ago.  I have a
> recommendation algorithm that learns the similarity matrix relying on both
> current items, and expired ones that should not be recommended.  However,
> AverageAbsoluteDifferenceRecom**menderEvaluator compares the predicted
> and actual ratings for all items, expired or not.  I believe the evaluation
> would be more realistic if it did not -- it corresponds more closely to how
> the algorithm is normally deployed in production.  For example, the newer
> items generally have fewer clicks, so this kind of an evaluation emphasizes
> the cold start problem we would experience in production.
>
> The evaluation uses expired items even if if I write a recommender class
> that forces all recommendations to use an IDRescorer that sets their scores
> to NaN.  The reason is that the ...Evaluator calls the Recommender::**doEstimatePreference
> function to calculate the predicted rating, bypassing the recommend
> function.  I checked for the presence of expired items by running my
> recommender in the debugger, and checking the item IDs when
> doEstimatePreference is called.
>
> Do I understand the evaluator's behavior correctly?  Do you think this is
> considered a bug?
>
> Thanks,
>
> Anatoliy
>