You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by gabeweb <ga...@htc.com> on 2010/09/15 03:51:18 UTC

(Not) ignoring NaN predicted preference values

In AbstractDifferenceRecommenderEvaluator, I've noticed that in the
PreferenceEstimateCallable subclass, in the call() method, the code simply
ignores any item for which the recommender couldn't estimate a preference.
It calls the recommender's estimatePreference() method and stores the
returned value in estimatedPreference. Then it does

if (!Float.isNaN(estimatedPreference)) {
estimatedPreference = capEstimatedPreference(estimatedPreference);
processOneEstimate(estimatedPreference, realPref);
}

Is this really what it should be doing? It isn't even sending a log message
that no preference could be estimated. In the case in which we are
performing an evaluation, this silently reduces the size of the test set.
And if the recommender can't actually estimate a preference for a lot of
test items, then this clearly isn't a very good recommender, so not
incurring any penalty for NaN items doesn't seem right. (I don't want to
actually propose a different solution in this email, but something like
using an overall average preference whenever the recommender can't return a
real predicted preference would seem more reasonable to me.)

I came across this issue because TreeClusteringRecommender actually
pre-calculates its predicted preferences when it builds the clusters -- but
it only pre-calculates the top 100 (NUM_CLUSTER_RECS) recommended items for
each cluster. This means that when you run an evaluation, most of the test
items actually get "NaN" as the predicted value. This problem can be easily
fixed by pre-calculating predicted preferences for *all* items in
TreeClusteringRecommender, but if you don't make this change, then the
evaluator is silently only evaluating those test items that are among the
100 items with the highest estimated preferences for each cluster.

Thanks.
--
View this message in context: http://lucene.472066.n3.nabble.com/Not-ignoring-NaN-predicted-preference-values-tp1476922p1476922.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: (Not) ignoring NaN predicted preference values

Posted by Sean Owen <sr...@gmail.com>.

It's a fair point. I don't see a clear way to include this information
in the average absolute difference figure. You could add in a
'penalty' datum of (max rating - min rating) or something to the
average, but that's a little artificial.

1 But it could be separately reported, as at least a log message.

2 Your other points concern making a recommender return NaN in fewer
cases. Yes, you can return an average item pref in the absence of a
better answer. That sounds like a potential role for a wrapper
Recommender implementation that can be put on any other
implementation. (This doesn't exist, could be written.)

3 For TreeClusteringRecommender, try replacing the fixed value
NUM_CLUSTER_RECS with possibleItemIDs.size(). I imagine this addresses
the issue and should scale OK, I think.

I can do 1 and 3 easily.

On Wed, Sep 15, 2010 at 2:51 AM, gabeweb <ga...@htc.com> wrote:
>
> In AbstractDifferenceRecommenderEvaluator, I've noticed that in the
> PreferenceEstimateCallable subclass, in the call() method, the code simply
> ignores any item for which the recommender couldn't estimate a preference.
> It calls the recommender's estimatePreference() method and stores the
> returned value in estimatedPreference.  Then it does
>
>    if (!Float.isNaN(estimatedPreference)) {
>          estimatedPreference = capEstimatedPreference(estimatedPreference);
>          processOneEstimate(estimatedPreference, realPref);
>    }
>
> Is this really what it should be doing?  It isn't even sending a log message
> that no preference could be estimated.  In the case in which we are
> performing an evaluation, this silently reduces the size of the test set.
> And if the recommender can't actually estimate a preference for a lot of
> test items, then this clearly isn't a very good recommender, so not
> incurring any penalty for NaN items doesn't seem right.  (I don't want to
> actually propose a different solution in this email, but something like
> using an overall average preference whenever the recommender can't return a
> real predicted preference would seem more reasonable to me.)
>
> I came across this issue because TreeClusteringRecommender actually
> pre-calculates its predicted preferences when it builds the clusters -- but
> it only pre-calculates the top 100 (NUM_CLUSTER_RECS) recommended items for
> each cluster.  This means that when you run an evaluation, most of the test
> items actually get "NaN" as the predicted value.  This problem can be easily
> fixed by pre-calculating predicted preferences for *all* items in
> TreeClusteringRecommender, but if you don't make this change, then the
> evaluator is silently only evaluating those test items that are among the
> 100 items with the highest estimated preferences for each cluster.
>
> Thanks.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Not-ignoring-NaN-predicted-preference-values-tp1476922p1476922.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>