You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (Commented) (JIRA)" <ji...@apache.org> on 2011/11/27 22:33:39 UTC

[jira] [Commented] (MAHOUT-898) Error in formula for preference estimation in GenericItemBasedRecommender

    [ https://issues.apache.org/jira/browse/MAHOUT-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158028#comment-13158028 ] 

Sean Owen commented on MAHOUT-898:
----------------------------------

I understand the issue, but this doesn't fix it. Say your ratings are between 1 and 5. Say you have similarity -0.5 to an item rated 3 and -0.5 to an item rated 4. Using the absolute value in the denominator only would lead you to estimate a preference of -3.5, which is also not possible. It's not even reasonable to cap it to 1 here.

Really... negative weights are just a problem since they don't make sense. In practice, in the framework, the *only* metric with this problem is Pearson, since it's the only one that actually returns values < 0. In retrospect would have been nicer to define this as returning a value between 0 and 1.

You could use (1+similarity) as a weight, since that's at least nonnegative. I feel like I did it this way in the beginning... and took it out as it caused another problem. I'd have to think about just why that was. We could go back to that; it has non-trivial implications.

I don't want to make this exact change but leave it open for some other ideas.
                
> Error in formula for preference estimation in GenericItemBasedRecommender
> -------------------------------------------------------------------------
>
>                 Key: MAHOUT-898
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-898
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>         Environment: mahout-core
>            Reporter: Paulo Villegas
>            Assignee: Sean Owen
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.6
>
>         Attachments: GenericItemBasedRecommender.diff
>
>
> The formula to estimate the preference for an item in the Taste item-based recommender normalizes by the sum of similarities for items used in estimation. But the terms in the sum taken to normalize should be in absolute value, since they can be negative (e.g. when using Pearson correlation, similarity is in [-1,1]). Now they are not, and as a result when there are negative and positive values they cancel out, giving a small denominator and incorrectly boosting the preference for the item (symptom: it is easy for a predicted preference to take the maximum value, since the quotient becomes large and it is capped afterwards)
> The patch is rather trivial (a one-liner, actually) for src/main/java/org/apache/mahout/cf/taste/impl/recommender/GenericItemBasedRecommender.java
> Note: the same error & suggested fix happens in GenericUserBasedRecommender

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira