You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pfontana3w2 <gi...@git.apache.org> on 2014/08/22 22:39:06 UTC

[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

GitHub user pfontana3w2 opened a pull request:

    https://github.com/apache/spark/pull/2100

    Error in Page Rank Computation in PageRank.scala

    I saw an error in the Page Rank computation for runUntilConverge() in PageRank.scala. It uses the oldPR instead of the resetProb. Note that the run() Method in PageRank.scala uses resetProb as my correction does here (see Lines 95–96 of PageRank.scala).
    
    Here is the diff that I see (in case it is hidden later):
    ```scala
    -      val newPR = oldPR + (1.0 - resetProb) * msgSum
    +      // Equation: resetProb * (1-resetProb)*msgSum
    +      val newPR = resetProb + (1.0 - resetProb) * msgSum
    ```
    
    If I am incorrect, feel free to make the proper correction.
    
    Best Wishes

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pfontana3w2/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2100
    
----
commit 493d7357f429ede73dc3e59c1e21c31c215fd87e
Author: Peter Fontana <pe...@nist.gov>
Date:   2014-08-22T20:32:50Z

    Fixed PageRank to add resetProb, not oldPR

commit 18eb2319beff5edd5d9914cb286e407ab1303daa
Author: Peter Fontana <pe...@nist.gov>
Date:   2014-08-22T20:36:42Z

    Corrected PageRankFile

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53117881
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53139983
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19089/consoleFull) for   PR 2100 at commit [`18eb231`](https://github.com/apache/spark/commit/18eb2319beff5edd5d9914cb286e407ab1303daa).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53138922
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19089/consoleFull) for   PR 2100 at commit [`18eb231`](https://github.com/apache/spark/commit/18eb2319beff5edd5d9914cb286e407ab1303daa).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53271700
  
    It looks like I may be mistaken (since the unit tests failed), so I am closing this pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53118652
  
    Do take a look. I noticed that with this patch, dangling nodes no longer have page ranks equal to their reset probabilities, so there may not be an error, or my solution may not be the right one. Here is a small test that I did.
    
    Edge Table:
    
    | Node1  | Node2  |
    | ------------- | ------------- |
    1 | 2
    1 |	3
    3 |	2
    3 |	4
    5 |	3
    6 |	7
    7 |	8
    8 |	9
    9 |	7
    
    Node Table:
    
    | NodeID  | NodeName  |
    | ------------- | ------------- |
    a |	1
    b |	2
    c |	3
    d |	4
    e |	5
    f |	6
    g |	7
    h |	8
    i |	9
    j.longaddress.com |	10
    
    Page Ranks Before Patch:
    
    (4,0.29503124999999997)
    (1,0.15)
    (6,0.15)
    (3,0.34124999999999994)
    (7,1.3299054047985106)
    (9,1.2381240056453071)
    (8,1.2803346052504254)
    (10,0.15)
    (5,0.15)
    (2,0.35878124999999994)
    
    
    Page Ranks After Patch:
    
    (4,0.2488125)
    (1,0.3)
    (6,0.3)
    (3,0.5325)
    (7,0.23925000000000002)
    (9,0.19335000000000002)
    (8,0.45600000000000007)
    (10,0.3)
    (5,0.3)
    (2,0.2488125)
    
    Note that node 10 is not in the edge table



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53324119
  
    Thank you for taking the time to explain the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53311125
  
    To answer your concern, the `runUntilConvergence` version of PageRank uses delta messages, where the msgSum operates on the delta graph where the rank of each vertex is the change in PageRank from one iteration to the next. You can see that on lines 144 and 149: the `sendMessage` function uses `edge.srcAttr._2` as the source rank, which was set to `newPR - oldPR`.
    
    As a result, we have to add in `oldPR` at each step to obtain the PageRank of the input graph rather than the delta graph.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2100#issuecomment-53138797
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Error in Page Rank Computation in PageRank.sca...

Posted by pfontana3w2 <gi...@git.apache.org>.
Github user pfontana3w2 closed the pull request at:

    https://github.com/apache/spark/pull/2100


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org