You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sebastian Schelter (JIRA)" <ji...@apache.org> on 2010/12/13 01:02:02 UTC

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-541:
--------------------------------------

    Attachment: MAHOUT-541.patch

Tamas,

I created this patch from the files you supplied, and I also cleaned up the code a little. I did some simple testing and the recommender seems to work fine.

I left out something because I did not understand it: You use a "modified" dataModel after training where the original preferences are replaced by the estimated ones, what's the reason for doing this?

Another question: How can we test the speedup this patch should bring? I did some evaluation on the 1M movielens dataset and didn't see any increase in computation speed, but maybe that dataset is too small or I got the parameters wrong.

Can you please review the patch and see if I got everything right? 



> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.