You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Zeno Gantner (JIRA)" <ji...@apache.org> on 2012/10/29 21:50:13 UTC

[jira] [Created] (MAHOUT-1106) SVD++

Zeno Gantner created MAHOUT-1106:
------------------------------------

             Summary: SVD++
                 Key: MAHOUT-1106
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
             Project: Mahout
          Issue Type: New Feature
          Components: Collaborative Filtering
            Reporter: Zeno Gantner
            Assignee: Sean Owen


Initial shot at SVD++.
Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.

One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.

I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498767#comment-13498767 ] 

Sean Owen commented on MAHOUT-1106:
-----------------------------------

Yes I think this is true -- ignoring lambda. The SVD++ model is explaining the user's latent factors as some combination of explicit and implicit factors. Why does the model think you like Shrek? Is it because you rated Shrek 4 stars or clicked it 6 times? Either, both or some of both could make sense. The regularization parameter does constrain it to a 'simple' explanation involving the two and lambda should be positive. So if the premise is no regularization -- don't do that, I suppose. You don't necessarily have a unique solution even with regularization but it is not of this form.

There's a more interesting general question about explicit vs implicit feedback. I certainly don't think you can ignore implicit feedback. Most of the data in the world is implicit. My question is really whether it's more interesting to forget 'explicit' data entirely since it's rare and noisy. This is why I personally like ALS-WR, as it is really just the same thing, much simplified and faster since there is no mean or explicit term to worry about. You could argue it's coarser, but if you believe it's a world of 99% implicit data, it is negligibly different.
                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Zeno Gantner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507995#comment-13507995 ] 

Zeno Gantner commented on MAHOUT-1106:
--------------------------------------

I agree with Sean. Implicit feedback is 99%.

Only for those cases where you have explicit ratings (or thumbs up/down), you would use SVD++.

                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-1106) SVD++

Posted by "Zeno Gantner (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zeno Gantner updated MAHOUT-1106:
---------------------------------

    Attachment: SVDPlusPlusFactorizer.java
    
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sean Owen
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1106) SVD++

Posted by "Agnonchik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498742#comment-13498742 ] 

Agnonchik edited comment on MAHOUT-1106 at 11/16/12 11:37 AM:
--------------------------------------------------------------

May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
We can write the SVD++ model as

ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)) * q(item)^T

and the learning algorithm try to optimize the following cost function

sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
Lets introduce the matrix Z such that

[Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)

Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

Am I right?
If yes, then my point is that applying SVD++ doesn't make much sense in comparison to biased SVD which ignores implicit feedback (Y parameter).
Thanks!
                
      was (Author: agnonchik):
    May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
We can write the SVD++ model as

ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)) * q(item)^T

and the learning algorithm try to optimize the following cost function

sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
Lets introduce the matrix Z such that

[Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)

Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

Am I right?
If yes, then the point is that applying SVD++ doesn't make much sense in comparison to biased SVD.
Thanks!
                  
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1106) SVD++

Posted by "Agnonchik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498742#comment-13498742 ] 

Agnonchik edited comment on MAHOUT-1106 at 11/16/12 11:30 AM:
--------------------------------------------------------------

May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
We can write the SVD++ model as

ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)) * q(item)^T

and the learning algorithm try to optimize the following cost function

sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
Lets introduce the matrix Z such that

[Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)

Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

Am I right?
If yes, then the point is that applying SVD++ doesn't make much sense in comparison to biased SVD.
Thanks!
                
      was (Author: agnonchik):
    May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
We can write the SVD++ model as

ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)) * q(item)^T

and the learning algorithm try to optimize the following cost function

sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
Lets introduce the matrix Z such that

[Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)

Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

Am I right?
If yes, then applying SVD++ to rating data doesn't make much sense in comparison to biased SVD.
Thanks!
                  
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Agnonchik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498742#comment-13498742 ] 

Agnonchik commented on MAHOUT-1106:
-----------------------------------

May I ask here some abstract question regarding the SVD++ algorithm? It has nothing to do with the code. Please excuse me if I'm posting it in the wrong place.
I wonder if the optimization problem solved by the SVD++ algorithm has a unique solution?
Seems that in some cases, for example, when the regularization parameter lambda is equal to zero, the problem permits multiple solutions.
We can write the SVD++ model as

ratingPrediction(user, item) = mu + bu(user) + bi(item) + (p(user) + |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)) * q(item)^T

and the learning algorithm try to optimize the following cost function

sum_{(user, item) from R} (ratingPrediction - observedRating)^2 + lambda * (||bu||_2^2 + ||bi||_2^2 + ||P||_F^2 + ||Q||_F^2 + ||Y||_F^2)

where P = [p(1); ... ;p(m)], Q = [q(1); ... ;q(n)], Y = [y(1); ... ;y(n)].
Lets introduce the matrix Z such that

[Z * Y](user) = |N(user)|^(-0.5) * sum_{implItem from N(user)} y(implItem)

Then for any solution P and Y of the optimization problem and an arbitrary vector Y2, P2 = P + Z * (Y - Y2) and Y2 is also a solution.

Am I right?
If yes, then applying SVD++ to rating data doesn't make much sense in comparison to biased SVD.
Thanks!
                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-1106) SVD++

Posted by "Sebastian Schelter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schelter resolved MAHOUT-1106.
----------------------------------------

    Resolution: Fixed
      Assignee: Sebastian Schelter  (was: Sean Owen)

Thank you very much again!
                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Zeno Gantner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486356#comment-13486356 ] 

Zeno Gantner commented on MAHOUT-1106:
--------------------------------------

Another general comment: to my experience, there is not much to gain in using different randomNoise values – 0.1 would be fine, I guess. Not sure whether we need it in the constructor.
                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sean Owen
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486544#comment-13486544 ] 

Hudson commented on MAHOUT-1106:
--------------------------------

Integrated in Mahout-Quality #1724 (See [https://builds.apache.org/job/Mahout-Quality/1724/])
    MAHOUT-1106 SVD++ (Revision 1403522)

     Result = SUCCESS
ssc : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1403522
Files : 
* /mahout/trunk/core/src/main/java/org/apache/mahout/cf/taste/impl/recommender/svd/SVDPlusPlusFactorizer.java

                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1106) SVD++

Posted by "Agnonchik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498782#comment-13498782 ] 

Agnonchik commented on MAHOUT-1106:
-----------------------------------

Thanks, Sean. I've got your point.
                
> SVD++
> -----
>
>                 Key: MAHOUT-1106
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1106
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Collaborative Filtering
>            Reporter: Zeno Gantner
>            Assignee: Sebastian Schelter
>         Attachments: SVDPlusPlusFactorizer.java
>
>
> Initial shot at SVD++.
> Relies on the RatingsSGDFactorizer class introduced in MAHOUT-1089.
> One could also think about several enhancements, e.g. having separate regularization constants for user and item factors.
> I am also the author of the SVDPlusPlus class in MyMediaLite, so if there are any similarities, no need to worry -- I am okay with relicensing this to the Apache 2.0 license.
> https://github.com/zenogantner/MyMediaLite/blob/master/src/MyMediaLite/RatingPrediction/SVDPlusPlus.cs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira