You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "bonnal-enzo (Jira)" <ji...@apache.org> on 2021/05/09 17:47:00 UTC

[jira] [Created] (SPARK-35357) Allow to turn off the normalization applied by static PageRank utilities

bonnal-enzo created SPARK-35357:
-----------------------------------

             Summary: Allow to turn off the normalization applied by static PageRank utilities
                 Key: SPARK-35357
                 URL: https://issues.apache.org/jira/browse/SPARK-35357
             Project: Spark
          Issue Type: Improvement
          Components: GraphX
    Affects Versions: 3.1.1
            Reporter: bonnal-enzo


Since SPARK-18847, static PageRank computations available in `PageRank.scala` are normalizing the sum of the ranks after the fixed number of iterations has completed, and *there is no way for a developer to access the raw non normalized ranks values*.

Since SPARK-29877 one can run a fixed number of PageRank iterations starting from previous `preRankGraph`'s ranks.
 This nice feature open the door for interesting *incremental algorithms*, for example:
 "Run some initial pagerank iterations using `PageRank.runWithOptions` and then update the graph's edges and update the ranks with a call to `PageRank.runWithOptionsWithPreviousPageRank`, and so on...".

This kind of algorithms would highly benefit (precision gain) from being allowed to manipulate directly the raw ranks (and not the normalized ones) in the case where the graph has a substantial proportion of sinks (vertices without outgoing edges).

It would be nice to add a method's signature having a boolean that allows to turn off the automatic normalization run at the end of `PageRank.runWithOptions` and `PageRank.runWithOptionsWithPreviousPageRank`, making the developers free to apply the normalization only when they really need it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org