You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yves Raimond (JIRA)" <ji...@apache.org> on 2015/10/30 23:24:27 UTC

[jira] [Created] (SPARK-11432) Personalized PageRank shouldn't use uniform initialization

Yves Raimond created SPARK-11432:
------------------------------------

             Summary: Personalized PageRank shouldn't use uniform initialization
                 Key: SPARK-11432
                 URL: https://issues.apache.org/jira/browse/SPARK-11432
             Project: Spark
          Issue Type: Bug
          Components: GraphX
    Affects Versions: 1.5.1
            Reporter: Yves Raimond
            Priority: Minor


The current implementation of personalized pagerank in GraphX uses uniform initialization over the full graph - every vertex will get initially activated.

For example:

{code}
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val users: RDD[(VertexId, (String, String))] =
  sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
                       (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
val relationships: RDD[Edge[String]] =
  sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
                       Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
val defaultUser = ("John Doe", "Missing")
val graph = Graph(users, relationships, defaultUser)
graph.staticPersonalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
{code}

Leads to all vertices being set to resetProb (0.15), which is different from the behavior described in SPARK-5854, where only the source node should be activated. The risk is that, after a few iterations, the most activated nodes are the source node and the nodes that were untouched by the propagation.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org