You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/10/30 23:48:27 UTC

[jira] [Assigned] (SPARK-11432) Personalized PageRank shouldn't use uniform initialization

     [ https://issues.apache.org/jira/browse/SPARK-11432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-11432:
------------------------------------

    Assignee:     (was: Apache Spark)

> Personalized PageRank shouldn't use uniform initialization
> ----------------------------------------------------------
>
>                 Key: SPARK-11432
>                 URL: https://issues.apache.org/jira/browse/SPARK-11432
>             Project: Spark
>          Issue Type: Bug
>          Components: GraphX
>    Affects Versions: 1.5.1
>            Reporter: Yves Raimond
>            Priority: Minor
>
> The current implementation of personalized pagerank in GraphX uses uniform initialization over the full graph - every vertex will get initially activated.
> For example:
> {code}
> import org.apache.spark._
> import org.apache.spark.graphx._
> import org.apache.spark.rdd.RDD
> val users: RDD[(VertexId, (String, String))] =
>   sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
>                        (5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
> val relationships: RDD[Edge[String]] =
>   sc.parallelize(Array(Edge(3L, 7L, "collab"),    Edge(5L, 3L, "advisor"),
>                        Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
> val defaultUser = ("John Doe", "Missing")
> val graph = Graph(users, relationships, defaultUser)
> graph.staticPersonalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
> {code}
> Leads to all vertices being set to resetProb (0.15), which is different from the behavior described in SPARK-5854, where only the source node should be activated. 
> The risk is that, after a few iterations, the most activated nodes are the source node and the nodes that were untouched by the propagation. For example in the above example the vertex 2L will always have an activation of 0.15:
> {code}
> graph.personalizedPageRank(3L, 0, 0.15).vertices.collect.foreach(println)
> {code}
> Which leads into a higher score for 2L than for 7L and 5L, even though there's no outbound path from 3L to 2L.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org