You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by luluorta <gi...@git.apache.org> on 2014/08/04 09:11:25 UTC
[GitHub] spark pull request: fix GraphX EdgeRDD zipPartitions
GitHub user luluorta opened a pull request:
https://github.com/apache/spark/pull/1763
fix GraphX EdgeRDD zipPartitions
If the users set “spark.default.parallelism” and the value is different with the EdgeRDD partition number, GraphX jobs will throw:
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/luluorta/spark fix-graph-zip
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1763.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1763
----
commit 83389614959fb2c84b947362af1e0babbfe767d5
Author: luluorta <lu...@gmail.com>
Date: 2014-08-04T07:03:17Z
fix GraphX EdgeRDD zipPartitions
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1763
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823]fix GraphX EdgeRDD zipPartitions
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-51024735
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by luluorta <gi...@git.apache.org>.
Github user luluorta commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-68027993
Thanks, @Earne
Actually we already had a method to customize the partition number of EdgeRDD by using `Graph.partitionBy` [Graph.scala#L136](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Graph.scala#L136).
I guess the better name for the param of `coalesce(numEdgePartitions)` is maxEdgePartitions, cause it is used for making sure the generated EdgeRDD with no more than `maxEdgePartitions` partitions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-54244869
Thanks! I added a test, verified that it failed before and succeeds now, and merged this into master, branch-1.1, and branch-1.0.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-51098456
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-51098993
QA tests have started for PR 1763. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by Earne <gi...@git.apache.org>.
Github user Earne commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-67954996
@ankurdave Ithink you miss this PR when you [Extract interfaces for EdgeRDD and VertexRDD[(https://github.com/apache/spark/pull/2530).
[SPARK-2823](https://issues.apache.org/jira/browse/SPARK-2823) was reopened due to this.
Can we just hava a param *numPartitions* in EdgeRDD.scala like what [VertexRDD#L313](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala#L313) did?
Is **coalesce** necessary in [GraphLoader#L70](https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphLoader.scala#L70)? RDD after coalesce(numEdgePartitions) *may not have* partitions.length == numEdgePartitions
![coalesce](https://cloud.githubusercontent.com/assets/1490540/5538583/5e41c9ca-8af1-11e4-8db4-bc355c1b3ee8.PNG)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by ankurdave <gi...@git.apache.org>.
Github user ankurdave commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-52378439
Sorry for the delay on this. It would be great if the PR also added a unit test to reproduce the bug. I can add that if you don't have time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-2823][GraphX]fix GraphX EdgeRDD zipPart...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1763#issuecomment-51106495
QA results for PR 1763:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17863/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org