You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by uncleGen <gi...@git.apache.org> on 2014/09/03 15:31:33 UTC

[GitHub] spark pull request: [GraphX]: trim some useless informations of Ve...

GitHub user uncleGen opened a pull request:

    https://github.com/apache/spark/pull/2249

    [GraphX]: trim some useless informations of VertexRDD in some cases

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/uncleGen/spark master_graphx-trim

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2249
    
----
commit 09b059e61da550e03f2d497d370e7bd2c9e1832a
Author: uncleGen <hu...@gmail.com>
Date:   2014-09-03T13:29:07Z

    [GraphX]: trim some useless informations of VertexRDD in some cases

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-56428627
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-54399143
  
    @ankurdave Thanks for you comments, I will update it as soon as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-56429358
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20662/consoleFull) for   PR 2249 at commit [`0138984`](https://github.com/apache/spark/commit/0138984c9fa694aae5d19b0b540d0e8a997edee1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-81433144
  
    @ankurdave This PR has gone stale. Since GraphX has graduated from alpha, do we need to close this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-63654065
  
    @ankurdave Hi, can you review it again. Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-54306441
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19662/consoleFull) for   PR 2249 at commit [`09b059e`](https://github.com/apache/spark/commit/09b059e61da550e03f2d497d370e7bd2c9e1832a).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  protected trait YarnAllocateResponse `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-56441469
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20662/consoleFull) for   PR 2249 at commit [`0138984`](https://github.com/apache/spark/commit/0138984c9fa694aae5d19b0b540d0e8a997edee1).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-81476933
  
    @uncleGen Yeah, unfortunately we can't add optional parameters anymore, and workarounds like adding a set of new methods seem too messy. We could just add a single new method `Graph#reindex` that rebuilds the routing table. But users can already just do `Graph(g.vertices, g.edges)` to get the same effect, though with a performance penalty for rebuilding the rest of the data structures as well.
    
    So let's close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: Filtering operations sho...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen closed the pull request at:

    https://github.com/apache/spark/pull/2249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [GraphX]: trim some useless informations of Ve...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-54296697
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19662/consoleFull) for   PR 2249 at commit [`09b059e`](https://github.com/apache/spark/commit/09b059e61da550e03f2d497d370e7bd2c9e1832a).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

Posted by uncleGen <gi...@git.apache.org>.
Github user uncleGen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2249#discussion_r17095835
  
    --- Diff: graphx/src/main/scala/org/apache/spark/graphx/Graph.scala ---
    @@ -262,13 +262,61 @@ abstract class Graph[VD: ClassTag, ED: ClassTag] protected () extends Serializab
         : Graph[VD, ED]
     
       /**
    +   * Restricts the graph to only the vertices and edges satisfying the predicates. The resulting
    +   * subgraph satisifies
    +   *
    +   * {{{
    +   * V' = {v : for all v in V where vpred(v)}
    +   * E' = {(u,v): for all (u,v) in E where epred((u,v)) && vpred(u) && vpred(v)}
    +   * }}}
    +   *
    +   * @param epred the edge predicate, which takes a triplet and
    +   * evaluates to true if the edge is to remain in the subgraph.  Note
    +   * that only edges where both vertices satisfy the vertex
    +   * predicate are considered.
    +   *
    +   * @param vpred the vertex predicate, which takes a vertex object and
    +   * evaluates to true if the vertex is to be included in the subgraph
    +   *
    +   * @param updateRoutingTable whether to rebuild the routingTable in
    +   * VertexRDD to reduce the shuffle when shipping VertexAttributes.
    +   *
    +   * @return the subgraph containing only the vertices and edges that
    +   * satisfy the predicates
    +   */
    +  def subgraph(
    +      epred: EdgeTriplet[VD,ED] => Boolean = (x => true),
    +      vpred: (VertexId, VD) => Boolean = ((v, d) => true),
    +      updateRoutingTable: Boolean)
    +  : Graph[VD, ED]
    --- End diff --
    
    something wrong here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3373][GraphX]: trim some useless inform...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/2249#issuecomment-54371860
  
    Thanks for this PR - it addresses a very good point. I updated [SPARK-3373](https://issues.apache.org/jira/browse/SPARK-3373) with details on the problem that I think you're addressing.
    
    A few minor comments:
    
    1. For binary compatibility, rather than adding an optional flag to each filtering operation, it would be better to add an overload that requires the flag and have the original version call this version with the flag's default value.
    2. The name `trim` doesn't give much guidance on when to use it, and it sounds like it's a cheap operation when in fact it involves a shuffle to rebuild the routing table. How about something like `highlySelective` (since I think we want to do this when the filtering is highly selective) or simply `rebuildRoutingTables`?
    3. It would be better to factor out the `if (trim) ...` conditional into a GraphImpl constructor, maybe by adding a `rebuildRoutingTables` flag to `GraphImpl.fromExistingRDDs`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org