You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Tamas Jambor (JIRA)" <ji...@apache.org> on 2010/11/09 00:58:07 UTC

[jira] Created: (MAHOUT-541) Incremental SVD Implementation

Incremental SVD Implementation
------------------------------

                 Key: MAHOUT-541
                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
             Project: Mahout
          Issue Type: Improvement
          Components: Collaborative Filtering
            Reporter: Tamas Jambor


I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002994#comment-13002994 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

Hi.

I got around to do the test with movielens 10m (64 features and 20 iterations).

Training time mahout: 23185168ms
Training time this one: 4271031ms


> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979218#action_12979218 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

I can run the tests on performance increase on this one sometimes at the end of January, if that's not too late.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004489#comment-13004489 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

1, some bug fixes
2, changed the way random noise was added
3, when to shuffle the preference values

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970648#action_12970648 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

Hi Sebastian,

sorry, I am not really familiar with how this patch thing works.

yes, it is based on the link you suggested.

as far as I remember I changed the following methods:
 
public void train(int steps) in TJSVDRecommender

and added the predictRating(int i, int j, int f, Preference pref, boolean bTrailing) method in TJExpecationMaximization

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979226#action_12979226 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

AFAIK the plan is to release 0.5 around May 2011

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004497#comment-13004497 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

sorry, netbeans formatted the structure. here is another one.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004098#comment-13004098 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

I would like to add two more small changes.

1, Collections.shuffle(cachedPreferences, random) could be done after training each feature (or even after each iteration)

2, Initializing the vector doesn't really look correct to me. This part:

defaultValue = Math.sqrt((average - 1.0) / numFeatures);
leftVectors[userIndex][feature] = defaultValue + (random.nextDouble() - 0.5) * randomNoise;
rightVectors[itemIndex][feature] = defaultValue + (random.nextDouble() - 0.5) * randomNoise;

0.5 should be replaced by something that depends on the number of features and the rating scale.

what do you think?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-541:
-----------------------------

    Attachment: MAHOUT-541.patch

Here's my own version of the last patch with packages names fixed, some small formatting changes, and slight reimplementation of SVDPreference.

Sebastian are you in a position to comment on the change in light of the most recent performance info?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-541.
------------------------------

    Resolution: Fixed

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: MAHOUT-541.patch

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: TJSVDRecommender.java
                SVDPreference.java
                ExpectationMaximizationSVD.java

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Tamas Jambor
>         Attachments: ExpectationMaximizationSVD.java, SVDPreference.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970652#action_12970652 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

Thanks, I'll have a look at those methods.

Details for contributing and creating patches can be found here: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004491#comment-13004491 ] 

Sean Owen commented on MAHOUT-541:
----------------------------------

Is this latest patch against HEAD? It seems to have a lot more change than you had mentioned.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003539#comment-13003539 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

I changed it to the latest snapshot, following the structure that you are using now. 
Patch attached.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979187#action_12979187 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

The changes from this issue are not yet included. I could modify ExpectationMaximizationSVDFactorizer to include the changes proposed here, but I wanted to see some numbers for the performance improvements as we'd pay with a higher memory consumption for it if I understand the code correctly. My own tests did not show that the modified version runs significantly faster but I maybe I didn't do the right testing.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment:     (was: TJSVDRecommender.java)

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Tamas Jambor
>         Attachments: SVDPreference.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970642#action_12970642 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

Hi Tamas,

I'd like to review this, but that's kinda difficult because you did not supply a patch from which your changes can be seen. I'm looking through the code and getting some idea about the improvements you've made, yet it would be very helpful if you could point me to the main code passages you've changed.

Are you refering to http://sifter.org/~simon/journal/20061211.html when talking about the original version of the algorithm?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: TJSVDRecommender.java
                TJExpectationMaximizationSVD.java

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Tamas Jambor
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004151#comment-13004151 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

the problem with #2 is that the variable randomNoise doesn't control entirely the noise that affects the rating (and the error). It also depends on the rating scale and the number of features. I suggest this solution (which does make my experiments more stable):

        double prefInterval = dataModel.getMaxPreference() - dataModel.getMinPreference();
        defaultValue = Math.sqrt((average - (prefInterval * 0.1)) / numFeatures);
        double interval = (prefInterval * 0.1)  / numFeatures;

leftVectors[userIndex][feature] = defaultValue + (random.nextDouble() - 0.5) * interval * randomNoise; 
rightVectors[itemIndex][feature] = defaultValue + (random.nextDouble() - 0.5) * interval * randomNoise;

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: MAHOUT-541.patch

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979225#action_12979225 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

ok. will do. when will 0.5 be released, by the way?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003830#comment-13003830 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

the performance speedup looks incredibly good, I think it definitely balances out the slightly increased memory consumption, let's commit the patch. Thank you Tamas!

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979224#action_12979224 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

That's fine, take your time. Could you update the patch to work with the current trunk? I did some refactoring, but I didn't change the way the algorithm worked.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: MAHOUT-541.patch

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment: MAHOUT-541.patch

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12979181#action_12979181 ] 

Sean Owen commented on MAHOUT-541:
----------------------------------

Am I right that Sebastian's latest changes kind of incorporate this? The last patch here deals with ExpectationMaximizationSVD but Sebastian moved this out to ExpectationMaximizationSVDFactorizer. Is this effectively rolled into what SS did?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004635#comment-13004635 ] 

Hudson commented on MAHOUT-541:
-------------------------------

Integrated in Mahout-Quality #664 (See [https://hudson.apache.org/hudson/job/Mahout-Quality/664/])
    MAHOUT-541 part 2, more refinement from Tamas


> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970659#action_12970659 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

thanks. I'll do it that way next time. 
In fact, I have a nice SVD implementation for implicit feedback, which I always wanted to share with you guys.

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter updated MAHOUT-541:
--------------------------------------

    Attachment: MAHOUT-541.patch

Tamas,

I created this patch from the files you supplied, and I also cleaned up the code a little. I did some simple testing and the recommender seems to work fine.

I left out something because I did not understand it: You use a "modified" dataModel after training where the original preferences are replaced by the estimated ones, what's the reason for doing this?

Another question: How can we test the speedup this patch should bring? I did some evaluation on the 1M movielens dataset and didn't see any increase in computation speed, but maybe that dataset is too small or I got the parameters wrong.

Can you please review the patch and see if I got everything right? 



> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated MAHOUT-541:
-----------------------------

    Affects Version/s: 0.4
        Fix Version/s: 0.5
             Assignee: Sean Owen

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tamas Jambor updated MAHOUT-541:
--------------------------------

    Attachment:     (was: ExpectationMaximizationSVD.java)

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>            Reporter: Tamas Jambor
>         Attachments: SVDPreference.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004106#comment-13004106 ] 

Sean Owen commented on MAHOUT-541:
----------------------------------

For #2, I believe it is just trying to compute a value that is randomly within an interval of size "randomNoise" about "defaultValue". In that sense 0.5 makes sense as a fixed value. Don't know about #1 but trust you guys know your stuff. Is it worth a patch to describe completely?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Tamas Jambor (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974099#action_12974099 ] 

Tamas Jambor commented on MAHOUT-541:
-------------------------------------

sorry for the late reply.

I had a look at the code, it seems good. I made some slight modifications, attached the patch.

I don't know what was the reason of storing the a modified datamodel with the predicted rating, it was implemented in mahout, but later removed, as far as I remember.

I have tested the speedup, and it was much faster with the 1m movielens. If you have a look at the code there are way less calculations needed (especially calculating the dot product of the matrix for prediction during training), I can run that to confirm. 





> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-541) Incremental SVD Implementation

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974104#action_12974104 ] 

Sebastian Schelter commented on MAHOUT-541:
-------------------------------------------

No need to excuse for a late reply, we're all doing this in our spare time, thank you for looking through the patch.

Can you give us some numbers on the speed increase or describe your test setup again? I did some simple testing too and didn't see it run faster, but maybe I just got something wrong. In my understanding the code's intention is to trade a higher memory usage (the cached values) for a potential speed increase (not having to do calculate the dot-products), did I get that correctly?

> Incremental SVD Implementation
> ------------------------------
>
>                 Key: MAHOUT-541
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-541
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Tamas Jambor
>            Assignee: Sean Owen
>             Fix For: 0.5
>
>         Attachments: MAHOUT-541.patch, MAHOUT-541.patch, SVDPreference.java, TJExpectationMaximizationSVD.java, TJSVDRecommender.java
>
>
> I thought I'd put up this implementation of the popular SVD algorithm for recommender systems. It is based on the SVD implementation, but instead of computing each user and each item matrix, it trains the model iteratively, which was the original version that Simon Funk proposed.  The advantage of this implementation is that you don't have to recalculate the dot product of each user-item pair for each training cycle, they can be cached, which speeds up the algorithm considerably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.