You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by luluorta <gi...@git.apache.org> on 2014/08/04 09:11:25 UTC

[GitHub] spark pull request: fix GraphX EdgeRDD zipPartitions

GitHub user luluorta opened a pull request:

    https://github.com/apache/spark/pull/1763

    fix GraphX EdgeRDD zipPartitions

    If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
    java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/luluorta/spark fix-graph-zip

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1763.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1763
    
----
commit 83389614959fb2c84b947362af1e0babbfe767d5
Author: luluorta <lu...@gmail.com>
Date:   2014-08-04T07:03:17Z

    fix GraphX EdgeRDD zipPartitions

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1763


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823]fix GraphX EdgeRDD zipPartitions

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-51024735
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by luluorta <gi...@git.apache.org>.
Github user luluorta commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-68027993
  
    Thanks, @Earne 
    
    Actually we already had a method to customize the partition number of EdgeRDD by using `Graph.partitionBy` [Graph.scala#L136](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L136).
    
    I guess the better name for the param of `coalesce(numEdgePartitions)` is maxEdgePartitions, cause it is used for making sure the generated EdgeRDD with no more than `maxEdgePartitions` partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-54244869
  
    Thanks! I added a test, verified that it failed before and succeeds now, and merged this into master, branch-1.1, and branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-51098456
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-51098993
  
    QA tests have started for PR 1763. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by Earne <gi...@git.apache.org>.
Github user Earne commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-67954996
  
    @ankurdave  Ithink you miss this PR when you [Extract interfaces for EdgeRDD and VertexRDD[(https://github.com/apache/spark/pull/2530). 
    
    [SPARK-2823](https://issues.apache.org/jira/browse/SPARK-2823) was reopened due to this. 
    Can we just hava a param *numPartitions* in EdgeRDD.scala like what [VertexRDD#L313](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala#L313) did?
    Is **coalesce** necessary in [GraphLoader#L70](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala#L70)? RDD after coalesce(numEdgePartitions) *may not have* partitions.length == numEdgePartitions
    ![coalesce](https://cloud.githubusercontent.com/assets/1490540/5538583/5e41c9ca-8af1-11e4-8db4-bc355c1b3ee8.PNG)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-52378439
  
    Sorry for the delay on this. It would be great if the PR also added a unit test to reproduce the bug. I can add that if you don't have time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1763#issuecomment-51106495
  
    QA results for PR 1763:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org