You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by moustaki <gi...@git.apache.org> on 2015/10/30 23:47:54 UTC

[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

GitHub user moustaki opened a pull request:

    https://github.com/apache/spark/pull/9386

    [SPARK-11432][GraphX] Personalized PageRank shouldn't use uniform initialization

    Changes the personalized pagerank initialization to be non-uniform.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/moustaki/spark personalized-pagerank-init

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9386.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9386
    
----
commit d7f2edcab554adf45be9e43856f7665a1aa0e61c
Author: Yves Raimond <yr...@netflix.com>
Date:   2015-10-30T22:44:38Z

    Fixing SPARK-11432 - non-uniform initialization for personalized pagerank

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43562867
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -202,10 +211,15 @@ object PageRank extends Logging {
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr )
           // Set the vertex attributes to (initalPR, delta = 0)
    -      .mapVertices( (id, attr) => (0.0, 0.0) )
    +      .mapVertices( (id, attr) => {
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152668948
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dwmclary <gi...@git.apache.org>.
Github user dwmclary commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153207454
  
    Yes, I agree, it shouldn't add overhead.
    
    Sent from my iPhone
    
    > On Nov 2, 2015, at 4:35 PM, DB Tsai <no...@github.com> wrote:
    > 
    > I think the branching should be fine since it's just initialization.
    > 
    > —
    > Reply to this email directly or view it on GitHub.
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153234294
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43562826
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -104,17 +104,25 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId isDefined
    --- End diff --
    
    Add extra empty line after this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dwmclary <gi...@git.apache.org>.
Github user dwmclary commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153156750
  
    If I recall, we specifically decided against a conditional in the BSP function at that point because the branching might causes hotspots.  If that's still a concern, maybe @jegonzal can comment.  Otherwise, this looks good to me -- nice catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153234307
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43600008
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -104,17 +104,25 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId isDefined
         // Initialize the PageRank graph with each edge attribute having
    -    // weight 1/outDegree and each vertex with attribute 1.0.
    +    // weight 1/outDegree and each vertex with attribute resetProb.
    +    // When running personalized pagerank, only the source vertex
    +    // has an attribute resetProb. All others are set to 0.
         var rankGraph: Graph[Double, Double] = graph
           // Associate the degree with each vertex
           .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
           // Set the vertex attributes to the initial pagerank values
    -      .mapVertices( (id, attr) => resetProb )
    +      .mapVertices( (id, attr) => {
    +        if (personalized) {
    +          if (id == srcId.get) resetProb else 0.0
    +        } else {
    +          resetProb
    +        }
    +      })
    --- End diff --
    
    With src, let's do
    
    ```scala
          .mapVertices { (id, attr) => 
              if (id == src) resetProb else 0.0
          }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by moustaki <gi...@git.apache.org>.
Github user moustaki commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43713256
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -104,17 +104,25 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId isDefined
         // Initialize the PageRank graph with each edge attribute having
    -    // weight 1/outDegree and each vertex with attribute 1.0.
    +    // weight 1/outDegree and each vertex with attribute resetProb.
    +    // When running personalized pagerank, only the source vertex
    +    // has an attribute resetProb. All others are set to 0.
         var rankGraph: Graph[Double, Double] = graph
           // Associate the degree with each vertex
           .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
           // Set the vertex attributes to the initial pagerank values
    -      .mapVertices( (id, attr) => resetProb )
    +      .mapVertices( (id, attr) => {
    +        if (personalized) {
    +          if (id == srcId.get) resetProb else 0.0
    +        } else {
    +          resetProb
    +        }
    +      })
    --- End diff --
    
    Sorry @dbtsai not sure I understand this comment - where does src gets unboxed in that example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by moustaki <gi...@git.apache.org>.
Github user moustaki commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153237760
  
    Many thanks @dbtsai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152670478
  
    **[Test build #44711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44711/consoleFull)** for PR 9386 at commit [`d7f2edc`](https://github.com/apache/spark/commit/d7f2edcab554adf45be9e43856f7665a1aa0e61c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by moustaki <gi...@git.apache.org>.
Github user moustaki commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153234212
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152937577
  
    Seems that the implementation of personalized page rank doesn't follow the twitter's paper entirely when initializing the initial page rank. In the paper, only the source node should be activated. This PR addresses this issue. 
    
    +cc @dwmclary @ankurdave @rxin @jegonzal for more feedback.
    
    Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153234819
  
    **[Test build #44875 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44875/consoleFull)** for PR 9386 at commit [`ebf0726`](https://github.com/apache/spark/commit/ebf072698318c90393eac95d8244d18b529ea71e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153236172
  
    **[Test build #44875 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44875/consoleFull)** for PR 9386 at commit [`ebf0726`](https://github.com/apache/spark/commit/ebf072698318c90393eac95d8244d18b529ea71e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43599996
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -104,17 +104,25 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId isDefined
         // Initialize the PageRank graph with each edge attribute having
    -    // weight 1/outDegree and each vertex with attribute 1.0.
    +    // weight 1/outDegree and each vertex with attribute resetProb.
    +    // When running personalized pagerank, only the source vertex
    +    // has an attribute resetProb. All others are set to 0.
         var rankGraph: Graph[Double, Double] = graph
           // Associate the degree with each vertex
           .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
           // Set the vertex attributes to the initial pagerank values
    -      .mapVertices( (id, attr) => resetProb )
    +      .mapVertices( (id, attr) => {
    +        if (personalized) {
    +          if (id == srcId.get) resetProb else 0.0
    +        } else {
    +          resetProb
    +        }
    +      })
     
    -    val personalized = srcId isDefined
         val src: VertexId = srcId.getOrElse(-1L)
    --- End diff --
    
    Move up `val src: VertexId = srcId.getOrElse(-1L) ` to `val personalized = srcId isDefined`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by jegonzal <gi...@git.apache.org>.
Github user jegonzal commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153207973
  
    This is actually a pretty serious error since it could lead to mass being accumulated on unreachable sub-graphs.  The performance implications of the above branch should be negligible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152669038
  
    Jenkins, ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152672761
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153236211
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44875/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153236210
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152669575
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-188436113
  
    I backported this patch to branch-1.5 and 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43600170
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -202,10 +211,15 @@ object PageRank extends Logging {
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr )
           // Set the vertex attributes to (initalPR, delta = 0)
    -      .mapVertices( (id, attr) => (0.0, 0.0) )
    +      .mapVertices( (id, attr) => {
    +        if (personalized && id == srcId.get) {
    +          (resetProb, Double.NegativeInfinity)
    +        } else {
    +          (0.0, 0.0)
    +        }
    +      } )
           .cache()
     
    -    val personalized = srcId.isDefined
         val src: VertexId = srcId.getOrElse(-1L)
    --- End diff --
    
    Move up `src`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43562827
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -104,17 +104,25 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId isDefined
         // Initialize the PageRank graph with each edge attribute having
    -    // weight 1/outDegree and each vertex with attribute 1.0.
    +    // weight 1/outDegree and each vertex with attribute resetProb.
    +    // When running personalized pagerank, only the source vertex
    +    // has an attribute resetProb. All others are set to 0.
         var rankGraph: Graph[Double, Double] = graph
           // Associate the degree with each vertex
           .outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
           // Set the vertex attributes to the initial pagerank values
    -      .mapVertices( (id, attr) => resetProb )
    +      .mapVertices( (id, attr) => {
    +        if (personalized) {
    +          if (id == srcId.get) resetProb else 0.0
    +        } else {
    +          resetProb
    +        }
    +      })
    --- End diff --
    
    ```scala
          .mapVertices { (id, attr) => 
            if (personalized) {
              if (id == srcId.get) resetProb else 0.0
            } else {
              resetProb
            }
          }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152672696
  
    **[Test build #44711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44711/consoleFull)** for PR 9386 at commit [`d7f2edc`](https://github.com/apache/spark/commit/d7f2edcab554adf45be9e43856f7665a1aa0e61c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152668596
  
    Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43562852
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -192,6 +200,7 @@ object PageRank extends Logging {
           graph: Graph[VD, ED], tol: Double, resetProb: Double = 0.15,
           srcId: Option[VertexId] = None): Graph[Double, Double] =
       {
    +    val personalized = srcId.isDefined
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153236349
  
    LGTM. Thanks, merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9386#discussion_r43600052
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ---
    @@ -202,10 +211,15 @@ object PageRank extends Logging {
           // Set the weight on the edges based on the degree
           .mapTriplets( e => 1.0 / e.srcAttr )
           // Set the vertex attributes to (initalPR, delta = 0)
    -      .mapVertices( (id, attr) => (0.0, 0.0) )
    +      .mapVertices( (id, attr) => {
    +        if (personalized && id == srcId.get) {
    +          (resetProb, Double.NegativeInfinity)
    +        } else {
    +          (0.0, 0.0)
    +        }
    +      } )
    --- End diff --
    
    ditto~
    
    ```scala
    if (id == src) (resetProb, Double.NegativeInfinity) else (0.0, 0.0)
    ```
    
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by dbtsai <gi...@git.apache.org>.
Github user dbtsai commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-153204018
  
    I think the branching should be fine since it's just initialization. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152669585
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11432][GraphX] Personalized PageRank sh...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9386#issuecomment-152672762
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44711/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org