You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pfontana3w2 <gi...@git.apache.org> on 2014/08/22 22:39:06 UTC
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
GitHub user pfontana3w2 opened a pull request:
https://github.com/apache/spark/pull/2100
Error in Page Rank Computation in PageRank.scala
I saw an error in the Page Rank computation for runUntilConverge() in PageRank.scala. It uses the oldPR instead of the resetProb. Note that the run() Method in PageRank.scala uses resetProb as my correction does here (see Lines 95–96 of PageRank.scala).
Here is the diff that I see (in case it is hidden later):
```scala
- val newPR = oldPR + (1.0 - resetProb) * msgSum
+ // Equation: resetProb * (1-resetProb)*msgSum
+ val newPR = resetProb + (1.0 - resetProb) * msgSum
```
If I am incorrect, feel free to make the proper correction.
Best Wishes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pfontana3w2/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2100.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2100
----
commit 493d7357f429ede73dc3e59c1e21c31c215fd87e
Author: Peter Fontana <pe...@nist.gov>
Date: 2014-08-22T20:32:50Z
Fixed PageRank to add resetProb, not oldPR
commit 18eb2319beff5edd5d9914cb286e407ab1303daa
Author: Peter Fontana <pe...@nist.gov>
Date: 2014-08-22T20:36:42Z
Corrected PageRankFile
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53117881
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53139983
[QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19089/consoleFull) for PR 2100 at commit [`18eb231`](https://github.com/apache/spark/commit/18eb2319beff5edd5d9914cb286e407ab1303daa).
* This patch **fails** unit tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53138922
[QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19089/consoleFull) for PR 2100 at commit [`18eb231`](https://github.com/apache/spark/commit/18eb2319beff5edd5d9914cb286e407ab1303daa).
* This patch merges cleanly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53271700
It looks like I may be mistaken (since the unit tests failed), so I am closing this pull request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53118652
Do take a look. I noticed that with this patch, dangling nodes no longer have page ranks equal to their reset probabilities, so there may not be an error, or my solution may not be the right one. Here is a small test that I did.
Edge Table:
| Node1 | Node2 |
| ------------- | ------------- |
1 | 2
1 | 3
3 | 2
3 | 4
5 | 3
6 | 7
7 | 8
8 | 9
9 | 7
Node Table:
| NodeID | NodeName |
| ------------- | ------------- |
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
g | 7
h | 8
i | 9
j.longaddress.com | 10
Page Ranks Before Patch:
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)
Page Ranks After Patch:
(4,0.2488125)
(1,0.3)
(6,0.3)
(3,0.5325)
(7,0.23925000000000002)
(9,0.19335000000000002)
(8,0.45600000000000007)
(10,0.3)
(5,0.3)
(2,0.2488125)
Note that node 10 is not in the edge table
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53324119
Thank you for taking the time to explain the code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53311125
To answer your concern, the `runUntilConvergence` version of PageRank uses delta messages, where the msgSum operates on the delta graph where the rank of each vertex is the change in PageRank from one iteration to the next. You can see that on lines 144 and 149: the `sendMessage` function uses `edge.srcAttr._2` as the source rank, which was set to `newPR - oldPR`.
As a result, we have to add in `oldPR` at each step to obtain the PageRank of the input graph rather than the delta graph.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/2100#issuecomment-53138797
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...
Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 closed the pull request at:
https://github.com/apache/spark/pull/2100
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org