You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alexander Czech <al...@googlemail.com.INVALID> on 2018/11/05 10:20:16 UTC

How to use the Graphframe PageRank method with dangling edges?

I have graph that has a couple of dangling edges. I use pyspark and work
with spark 2.2.0. It kind of looks like this:

g.vertices.show()
+---+
| id|
+---+
|  1|
|  2|
|  3|
|  4|
+---+
g.edges.show()
+---+----+
|src| dst|
+---+----+
|  1|   2|
|  2|   3|
|  3|   4|
|  4|   1|
|  4|null|
+---+----+

Now when I call g.pageRank(resetProbability=0.15, tol=0.01)it
obviously fails because one edge is pointing towards null.
What I want is that every page rank weight that is distributed towards
a dangling edge is randomly distrusted to a node in the graph much
like the dampening factor or resetProbability in the pagerank
function. Is this possible without rewriting the pagerank method?
Because my Scala knowledge is zero and reimplemeting it in phyton is
probably a pretty slow solution.

Thanks