You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nikhil Bhide (JIRA)" <ji...@apache.org> on 2017/09/01 11:43:00 UTC

[jira] [Commented] (SPARK-21861) Add more details to PageRank illustration

    [ https://issues.apache.org/jira/browse/SPARK-21861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150389#comment-16150389 ] 

Nikhil Bhide commented on SPARK-21861:
--------------------------------------

Hi Sean,
Please find additional contents as follows. I have added few comments in the description section (highlighted), and I have slightly modified the example (highlighted).
Just to summarize :
1. Added details about damping factor & reset probability
2. Added details of Personalized Page Rank Algo supported in Graphx
3. Modified example 
    - Sorted results in descending order by weights (ranks)
    - Added example of PRR


PageRank measures the importance of each vertex in a graph, assuming an edge from u to v represents an endorsement of v’s importance by u. For example, if a Twitter user is followed by many others, the user will be ranked highly.* PageRank works by computing number and quality of links to a node to estimate the importance of a node.* 
GraphX comes with static and dynamic implementations of PageRank as methods on the PageRank object. Static PageRank runs for a fixed number of iterations, while dynamic PageRank runs until the ranks converge (i.e., stop changing by more than a specified tolerance). *Dynamic version of page rank PageRank$pageRank takes in two parameters tolerance factor and reset probability, whereas static version of page rank PageRank$staticPageRank takes in 2 parameters, number of iterations and reset probability. Reset probability is associated with damping factor, which is click through probability. Page rank is based on random surfer model, and damping factor is factor by which surfer would continue visiting different links. Damping factor ranges between 0 and 1. By default, damping factor value is set to 0.85 and random probability is calculated as 1 – damping factor.*
*Graphx also supports Personalized PageRank (PRR), which is more general version of page rank. PRR is widely used in recommendation systems. For example, Twitter uses PRR to present users with other accounts that they may wish to follow. GraphX provides static and dynamic implementations of Personalized PageRank methods on PageRank object. 
GraphOpsallows calling these algorithms directly as methods on Graph. *

 import org.apache.spark.graphx.GraphLoader

    // Load the edges as a graph
    val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
    // Run PageRank
    val ranks = graph.pageRank(0.0001).vertices
    // Join the ranks with the usernames
    val users = sc.textFile("data/graphx/users.txt").map { line =>
      val fields = line.split(",")
      (fields(0).toLong, fields(1))
    }
    val ranksByUsername = users.join(ranks).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result
*    println(ranksByUsername.sortBy({ case (username, rank) => rank }, false).collect().mkString("\n"))
*
*    val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1, 0.0001).vertices
    val ranksPRRByUsername = users.join(ranksPRR).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result*
*    println(ranksPRRByUsername.sortBy({ case (username, rank) => rank }, false).collect().mkString("\n"))
*

> Add more details to PageRank illustration
> -----------------------------------------
>
>                 Key: SPARK-21861
>                 URL: https://issues.apache.org/jira/browse/SPARK-21861
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 2.2.0
>            Reporter: Nikhil Bhide
>            Priority: Trivial
>              Labels: documentation
>
> Add more details to PageRank illustration on [https://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank]
> Adding details of page rank algorithm parameters such as dumping factor would be pretty much effective. Also, adding more action on result such as sorting based on weight would be more helpful.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org