You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Cunlu Zou (JIRA)" <ji...@apache.org> on 2013/04/02 09:37:15 UTC

[jira] [Reopened] (MAHOUT-1185) MemoryDiffStorage.class has a bug for slope one algorithm which could cause incorrect recommendation results

     [ https://issues.apache.org/jira/browse/MAHOUT-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cunlu Zou reopened MAHOUT-1185:
-------------------------------


Please check the code carefully, there are two variables calcuated in the processOneUser function, the average diffs (the variable *average* in the code) calculated correctly as you said, but there is also another variable to calculate the average preference value for *individual item* (the variable *itemAverage* in the code), they are totally different. The itemAverage value is used when no diffs values are avaible to predict the preference, for example, suppose we have following user-pref matrix (a-c are users,A-C are items)
    | ||A||B||C|
    |a||1||-||3|
    |b||2||-||4|
    |c||-||2||-|
for user c, we wanna predict the preference value for item C, since we only know user c has the preference value for item B, but there is no diff value available between B and C, in this case, the mahout tried to use the average value for item C which is (3+4)/2=3.5 as the predict value for the item C. The same case for user c to predict the preference value for item A. By comparing the predicted values, we then recommend item C not item A to user c instead.

However, the code has the mistake for calculating this average value (*NOT the DIFF value) as I stated in the previous comments, hope I made this clear.
                
> MemoryDiffStorage.class has a bug for slope one algorithm which could cause incorrect recommendation results
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1185
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1185
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.7
>         Environment: Ubuntu
>            Reporter: Cunlu Zou
>            Assignee: Sean Owen
>              Labels: patch
>         Attachments: MemoryDiffStorage.patch
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The function processOneUser(long averageCount, long userID) in the MemoryDiffStorage.class file contains a bug for calculating the itemAverage. Since the function tried to calculate the average difference among items (in a nested loop) and also the average individual item preference value in the same loop (the loop only from 0 to length-2, *for (int i = 0; i < length - 1; i++)*), the itemAverage variable does not count the last item's preference value for every users which could lead to an incorrect recommendation results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Reopened] (MAHOUT-1185) MemoryDiffStorage.class has a bug for slope one algorithm which could cause incorrect recommendation results

Posted by Sean Owen <sr...@gmail.com>.
OK I will have another look. That makes more sense. I think the index still
has to be adjusted but that's simple.
On Apr 2, 2013 8:37 AM, "Cunlu Zou (JIRA)" <ji...@apache.org> wrote:

>
>      [
> https://issues.apache.org/jira/browse/MAHOUT-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Cunlu Zou reopened MAHOUT-1185:
> -------------------------------
>
>
> Please check the code carefully, there are two variables calcuated in the
> processOneUser function, the average diffs (the variable *average* in the
> code) calculated correctly as you said, but there is also another variable
> to calculate the average preference value for *individual item* (the
> variable *itemAverage* in the code), they are totally different. The
> itemAverage value is used when no diffs values are avaible to predict the
> preference, for example, suppose we have following user-pref matrix (a-c
> are users,A-C are items)
>     | ||A||B||C|
>     |a||1||-||3|
>     |b||2||-||4|
>     |c||-||2||-|
> for user c, we wanna predict the preference value for item C, since we
> only know user c has the preference value for item B, but there is no diff
> value available between B and C, in this case, the mahout tried to use the
> average value for item C which is (3+4)/2=3.5 as the predict value for the
> item C. The same case for user c to predict the preference value for item
> A. By comparing the predicted values, we then recommend item C not item A
> to user c instead.
>
> However, the code has the mistake for calculating this average value (*NOT
> the DIFF value) as I stated in the previous comments, hope I made this
> clear.
>
> > MemoryDiffStorage.class has a bug for slope one algorithm which could
> cause incorrect recommendation results
> >
> ------------------------------------------------------------------------------------------------------------
> >
> >                 Key: MAHOUT-1185
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1185
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Collaborative Filtering
> >    Affects Versions: 0.7
> >         Environment: Ubuntu
> >            Reporter: Cunlu Zou
> >            Assignee: Sean Owen
> >              Labels: patch
> >         Attachments: MemoryDiffStorage.patch
> >
> >   Original Estimate: 10m
> >  Remaining Estimate: 10m
> >
> > The function processOneUser(long averageCount, long userID) in the
> MemoryDiffStorage.class file contains a bug for calculating the
> itemAverage. Since the function tried to calculate the average difference
> among items (in a nested loop) and also the average individual item
> preference value in the same loop (the loop only from 0 to length-2, *for
> (int i = 0; i < length - 1; i++)*), the itemAverage variable does not count
> the last item's preference value for every users which could lead to an
> incorrect recommendation results.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>