You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Michael Malak <mi...@yahoo.com.INVALID> on 2015/04/03 17:41:56 UTC

Wrong initial bias in GraphX SVDPlusPlus?

I believe that in the initialization portion of GraphX SVDPlusPluS, the initialization of biases is incorrect. Specifically, in line 
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96 
instead of 
(vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1)) 
it should be 
(vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 / scala.math.sqrt(msg.get._1)) 

That is, the biases bu and bi (both represented as the third component of the Tuple4[] above, depending on whether the vertex is a user or an item), described in equation (1) of the Koren paper, are supposed to be small offsets to the mean (represented by the variable u, signifying the Greek letter mu) to account for peculiarities of individual users and items. 

Initializing these biases to wrong values should theoretically not matter given enough iterations of the algorithm, but some quick empirical testing shows it has trouble converging at all, even after many orders of magnitude additional iterations. 

This perhaps could be the source of previously reported trouble with SVDPlusPlus. 
http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html 

If after a day, no one tells me I'm crazy here, I'll go ahead and create a Jira ticket. 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Wrong initial bias in GraphX SVDPlusPlus?

Posted by Sean Owen <so...@cloudera.com>.

See now: https://issues.apache.org/jira/browse/SPARK-6710

On Mon, Apr 6, 2015 at 4:27 AM, Reynold Xin <rx...@databricks.com> wrote:
> Adding Jianping Wang to the thread, since he contributed the SVDPlusPlus
> implementaiton.
>
> Jianping,
>
> Can you take a look at this message? Thanks.
>
>
> On Fri, Apr 3, 2015 at 8:41 AM, Michael Malak <
> michaelmalak@yahoo.com.invalid> wrote:
>
>> I believe that in the initialization portion of GraphX SVDPlusPluS, the
>> initialization of biases is incorrect. Specifically, in line
>>
>> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
>> instead of
>> (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1))
>> it should be
>> (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 /
>> scala.math.sqrt(msg.get._1))
>>
>> That is, the biases bu and bi (both represented as the third component of
>> the Tuple4[] above, depending on whether the vertex is a user or an item),
>> described in equation (1) of the Koren paper, are supposed to be small
>> offsets to the mean (represented by the variable u, signifying the Greek
>> letter mu) to account for peculiarities of individual users and items.
>>
>> Initializing these biases to wrong values should theoretically not matter
>> given enough iterations of the algorithm, but some quick empirical testing
>> shows it has trouble converging at all, even after many orders of magnitude
>> additional iterations.
>>
>> This perhaps could be the source of previously reported trouble with
>> SVDPlusPlus.
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html
>>
>> If after a day, no one tells me I'm crazy here, I'll go ahead and create a
>> Jira ticket.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Wrong initial bias in GraphX SVDPlusPlus?

Posted by Reynold Xin <rx...@databricks.com>.

Adding Jianping Wang to the thread, since he contributed the SVDPlusPlus
implementaiton.

Jianping,

Can you take a look at this message? Thanks.


On Fri, Apr 3, 2015 at 8:41 AM, Michael Malak <
michaelmalak@yahoo.com.invalid> wrote:

> I believe that in the initialization portion of GraphX SVDPlusPluS, the
> initialization of biases is incorrect. Specifically, in line
>
> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala#L96
> instead of
> (vd._1, vd._2, msg.get._2 / msg.get._1, 1.0 / scala.math.sqrt(msg.get._1))
> it should be
> (vd._1, vd._2, msg.get._2 / msg.get._1 - u, 1.0 /
> scala.math.sqrt(msg.get._1))
>
> That is, the biases bu and bi (both represented as the third component of
> the Tuple4[] above, depending on whether the vertex is a user or an item),
> described in equation (1) of the Koren paper, are supposed to be small
> offsets to the mean (represented by the variable u, signifying the Greek
> letter mu) to account for peculiarities of individual users and items.
>
> Initializing these biases to wrong values should theoretically not matter
> given enough iterations of the algorithm, but some quick empirical testing
> shows it has trouble converging at all, even after many orders of magnitude
> additional iterations.
>
> This perhaps could be the source of previously reported trouble with
> SVDPlusPlus.
>
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-SVDPlusPlus-problem-td12885.html
>
> If after a day, no one tells me I'm crazy here, I'll go ahead and create a
> Jira ticket.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>