You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "De-En Lin (JIRA)" <ji...@apache.org> on 2019/05/16 02:56:00 UTC

[jira] [Comment Edited] (SPARK-27718) incorrect result from pagerank

    [ https://issues.apache.org/jira/browse/SPARK-27718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840945#comment-16840945 ] 

De-En Lin edited comment on SPARK-27718 at 5/16/19 2:55 AM:
------------------------------------------------------------

In wiki, the equation of PageRank is as follows:

!螢幕快照 2019-05-16 上午10.09.45.png!

At the line 85 of example of pagerank.py in Spark, 

 
{code:java}
ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15)
{code}
 

d is 0.85 and rank is summation over PR(pj)/L(pj) corresponding to the equation of PageRank.

However, the term (1-d)/N in the code is 0.15 

It forget to divide by N.

Therefore, I change the code to
{code:java}
ranks = contribs.reduceByKey(add).mapValues(
			lambda rank: rank * 0.85 + (1 / num_vals) * 0.15)
{code}
The result will be correct and consistent with the result of NetworkX if the iteration is many times.

 

 


was (Author: f422661):
In wiki, the equation of PageRank is as follows:

!螢幕快照 2019-05-16 上午10.09.45.png!

At the line 85 of example of pagerank.py in Spark, 

 
{code:java}
ranks = contribs.reduceByKey(add).mapValues(lambda rank: rank * 0.85 + 0.15)
{code}
 

d is 0.85 and rank is summation over PR(pj)/L(pj).

However, the term (1-d)/N in the code is 0.15 

It forget to divide by N.

Therefore, I change the code to
{code:java}
ranks = contribs.reduceByKey(add).mapValues(
			lambda rank: rank * 0.85 + (1 / num_vals) * 0.15)
{code}
The result will be correct and consistent with the result of NetworkX if the iteration is many times.

 

 

> incorrect result from pagerank
> ------------------------------
>
>                 Key: SPARK-27718
>                 URL: https://issues.apache.org/jira/browse/SPARK-27718
>             Project: Spark
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.4.1
>            Reporter: De-En Lin
>            Priority: Minor
>         Attachments: 螢幕快照 2019-05-16 上午10.09.45.png
>
>
> When I executed /examples/src/main/python/pagerank.py 
> The result is shown as follows
>  
> {code:java}
> 1 has rank: 0.5821576292853757.
> 2 has rank: 0.3361551945789305.
> 3 has rank: 0.3361551945789305.
> 4 has rank: 0.3361551945789305.
> {code}
>  
> However, the same graph executed in networkx-pagerank. The result 
> shown as follows
> {code:java}
> {1: 0.4797305739863632, 2: 0.1734231420045456, 3: 0.1734231420045456, 4: 0.1734231420045456}
> {code}
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org