You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "De-En Lin (JIRA)" <ji...@apache.org> on 2019/05/16 02:56:00 UTC
[jira] [Comment Edited] (SPARK-27718) incorrect result from
pagerank
[ https://issues.apache.org/jira/browse/SPARK-27718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840945#comment-16840945 ]
De-En Lin edited comment on SPARK-27718 at 5/16/19 2:55 AM:
------------------------------------------------------------
In wiki, the equation of PageRank is as follows:
!螢幕快照 2019-05-16 上午10.09.45.png!
At the line 85 of example of pagerank.py in Spark,
{code:java}
ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15)
{code}
d is 0.85 and rank is summation over PR(pj)/L(pj) corresponding to the equation of PageRank.
However, the term (1-d)/N in the code is 0.15
It forget to divide by N.
Therefore, I change the code to
{code:java}
ranks = contribs.reduceByKey(add).mapValues(
lambda rank: rank * 0.85 + (1 / num_vals) * 0.15)
{code}
The result will be correct and consistent with the result of NetworkX if the iteration is many times.
was (Author: f422661):
In wiki, the equation of PageRank is as follows:
!螢幕快照 2019-05-16 上午10.09.45.png!
At the line 85 of example of pagerank.py in Spark,
{code:java}
ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15)
{code}
d is 0.85 and rank is summation over PR(pj)/L(pj).
However, the term (1-d)/N in the code is 0.15
It forget to divide by N.
Therefore, I change the code to
{code:java}
ranks = contribs.reduceByKey(add).mapValues(
lambda rank: rank * 0.85 + (1 / num_vals) * 0.15)
{code}
The result will be correct and consistent with the result of NetworkX if the iteration is many times.
> incorrect result from pagerank
> ------------------------------
>
> Key: SPARK-27718
> URL: https://issues.apache.org/jira/browse/SPARK-27718
> Project: Spark
> Issue Type: Bug
> Components: Examples
> Affects Versions: 2.4.1
> Reporter: De-En Lin
> Priority: Minor
> Attachments: 螢幕快照 2019-05-16 上午10.09.45.png
>
>
> When I executed /examples/src/main/python/pagerank.py
> The result is shown as follows
>
> {code:java}
> 1 has rank: 0.5821576292853757.
> 2 has rank: 0.3361551945789305.
> 3 has rank: 0.3361551945789305.
> 4 has rank: 0.3361551945789305.
> {code}
>
> However, the same graph executed in networkx-pagerank. The result
> shown as follows
> {code:java}
> {1: 0.4797305739863632, 2: 0.1734231420045456, 3: 0.1734231420045456, 4: 0.1734231420045456}
> {code}
>
>
>
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org