You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/04/07 00:12:12 UTC

[jira] [Commented] (SPARK-6710) Wrong initial bias in GraphX SVDPlusPlus

    [ https://issues.apache.org/jira/browse/SPARK-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482063#comment-14482063 ] 

Reynold Xin commented on SPARK-6710:
------------------------------------

[~michaelmalak] would you like to submit a pull request for this?

> Wrong initial bias in GraphX SVDPlusPlus
> ----------------------------------------
>
>                 Key: SPARK-6710
>                 URL: https://issues.apache.org/jira/browse/SPARK-6710
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.3.0
>            Reporter: Michael Malak
>              Labels: easyfix
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In the initialization portion of GraphX SVDPlusPluS, the initialization of biases appears to be incorrect. Specifically, in line 
> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96 
> instead of 
> (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) 
> it should probably be 
> (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / scala.math.sqrt(msg.get._1)) 
> That is, the biases bu and bi (both represented as the third component of the Tuple4[] above, depending on whether the vertex is a user or an item), described in equation (1) of the Koren paper, are supposed to be small offsets to the mean (represented by the variable u, signifying the Greek letter mu) to account for peculiarities of individual users and items. 
> Initializing these biases to wrong values should theoretically not matter given enough iterations of the algorithm, but some quick empirical testing shows it has trouble converging at all, even after many orders of magnitude additional iterations. 
> This perhaps could be the source of previously reported trouble with SVDPlusPlus. 
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org