You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tmyklebu <gi...@git.apache.org> on 2014/04/15 00:18:49 UTC
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
GitHub user tmyklebu opened a pull request:
https://github.com/apache/spark/pull/407
[SPARK-1281] Improve partitioning in ALS
ALS was using HashPartitioner and explicit uses of `%` together. Further, the naked use of `%` meant that, if the number of partitions corresponded with the stride of arithmetic progressions appearing in user and product ids, users and products could be mapped into buckets in an unfair or unwise way.
This pull request:
1) Makes the Partitioner an instance variable of ALS.
2) Replaces the direct uses of `%` with calls to a Partitioner.
3) Defines an anonymous Partitioner that scrambles the bits of the object's hashCode before reducing to the number of present buckets.
This pull request does not make the partitioner user-configurable.
I'm not all that happy about the way I did (1). It introduces an icky lifetime issue and dances around it by nulling something. However, I don't know a better way to make the partitioner visible everywhere it needs to be visible.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tmyklebu/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/407.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #407
----
commit c774d7d4bff91c9387d059d1189799fa0ff1f4b0
Author: Tor Myklebust <tm...@gmail.com>
Date: 2014-04-14T22:01:18Z
Make the partitioner a member variable and use it instead of modding directly.
commit c90b6d8e91f86cf89adf28de6f9185647c87e5c8
Author: Tor Myklebust <tm...@gmail.com>
Date: 2014-04-14T22:10:30Z
Scramble user and product ids before bucketing.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909365
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40633279
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11693748
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = data._1.toArray
+ val correct = data._2
+ for (i <- 0 until ratings.length) {
+ var u = ratings(i).user
+ var p = ratings(i).product
+ var r = ratings(i).rating
+ u = u - 25
+ p = p - 25
+ ratings(i) = new Rating(u,p,r)
+ }
+ val ratingsRDD = sc.parallelize(ratings)
+
+ val model = ALS.train(ratingsRDD, 5, 15)
+
+ var pairs = new Array[(Int, Int)](0)
--- End diff --
Same trick here:
~~~
val pairs = ratings.map { case Rating(i, j, r) =>
(i, j)
}
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425686
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14120/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40438453
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11643157
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
private var lambda: Double,
private var implicitPrefs: Boolean,
private var alpha: Double,
+ private var partitioner: Partitioner = null,
--- End diff --
Okay, let's make it simpler. Just a single partitioner for both users and products, customizable via a setter function.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-41073331
LGTM. @tmyklebu Thanks for fixing the partitioning in ALS!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11802224
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ private var partitioner: Partitioner = null
--- End diff --
Since partitioner is not configurable now, we don't need a private member. Setting the partitioner directly in `run` should suffice.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795354
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
+ val correct = data._2
+ val model = ALS.train(ratings, 5, 15)
+
+ val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
--- End diff --
space after `,` and spaces around `-`.
If you are not sure about the style, please follow https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide or check the code base for similar usages.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795397
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
+ val correct = data._2
+ val model = ALS.train(ratings, 5, 15)
+
+ val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
+ val ans = model.predict(sc.parallelize(pairs)).collect
+ ans.foreach { r =>
+ val u = r.user + 25
+ val p = r.product + 25
+ val v = r.rating
+ val error = v - correct.get(u, p)
+ assert(math.abs(error) < 0.4)
--- End diff --
We shouldn't assert on each prediction. One outlier could fail the task. Instead, compute RMSE and put an assertion on it. Or you can modify `testALS` by adding `negativeIds` to its arguments and use it in this test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40626732
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40629414
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909000
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40884378
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14264/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40910803
@markhamstra The NNLS-related ones. I think I made them disappear, though, by git push --force'ing an old copy of my master branch (that happened to contain exactly the commits i did want in).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40630941
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883719
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40910281
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40525624
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11694575
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = data._1.toArray
+ val correct = data._2
+ for (i <- 0 until ratings.length) {
+ var u = ratings(i).user
+ var p = ratings(i).product
+ var r = ratings(i).rating
+ u = u - 25
+ p = p - 25
+ ratings(i) = new Rating(u,p,r)
+ }
+ val ratingsRDD = sc.parallelize(ratings)
+
+ val model = ALS.train(ratingsRDD, 5, 15)
+
+ var pairs = new Array[(Int, Int)](0)
--- End diff --
~~~
val pairs = Array.tabulate(50, 50)((u, p) => (u - 25, p - 25)).flatten
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40888823
Since we don’t separate user partitioner and product partitioner. With little chance a user can set a different partitioner that works magically better than the default, unless the user ids are separated from product ids, so the partitioner knows whether this is a user or a product. Maybe we should not let the partitioner be configurable. 'setNumBlocks' should update partitioner.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909034
Ugh. Why are those commits part of this PR now? How do I undo this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40438518
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14128/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/407
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11653978
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
}
- val partitioner = new HashPartitioner(numBlocks)
+ // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
+ def hash(x: Int): Int = {
--- End diff --
OK; I changed all instances of the hash function to byteswap32.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883723
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40908556
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40629415
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14179/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11643032
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
}
- val partitioner = new HashPartitioner(numBlocks)
+ // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
+ def hash(x: Int): Int = {
--- End diff --
Yes, let's change it. We need a fast hash function but not necessarily high quality. Using an existing implementation can simplify the code.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795333
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
--- End diff --
Ditto. There is no null case. Should return `x.asInstanceOf[Int] % 2` directly. Btw, should it be `% 3` to have three partitions? Or you want to test with an empty partition?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40592207
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11694652
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = data._1.toArray
+ val correct = data._2
+ for (i <- 0 until ratings.length) {
+ var u = ratings(i).user
+ var p = ratings(i).product
+ var r = ratings(i).rating
+ u = u - 25
+ p = p - 25
+ ratings(i) = new Rating(u,p,r)
+ }
+ val ratingsRDD = sc.parallelize(ratings)
+
+ val model = ALS.train(ratingsRDD, 5, 15)
+
+ var pairs = new Array[(Int, Int)](0)
+ for (u <- -25 until 25; p <- -25 until 25) {
+ pairs = pairs :+ (u, p)
+ }
+ val ans = model.predict(sc.parallelize(pairs)).collect
+ for (r <- ans) {
+ val u = r.user + 25
+ val p = r.product + 25
+ val v = r.rating
+ val error = v - correct.get(u,p)
--- End diff --
Put a space after `,`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40626298
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425865
Build failure. Looks like a config issue in Jenkins?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795311
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ private var partitioner: Partitioner = null
+
+ /** Sets the Partitioner that partitions users. */
--- End diff --
The same partitioner is also used in partitioning products.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11617633
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
}
- val partitioner = new HashPartitioner(numBlocks)
+ // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
+ def hash(x: Int): Int = {
--- End diff --
Instead of implementing a hash function, shall we use some existing ones? For example, we may use `scala.util.hashing.byteswap32`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40910282
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11617460
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
private var lambda: Double,
private var implicitPrefs: Boolean,
private var alpha: Double,
+ private var partitioner: Partitioner = null,
--- End diff --
Do not put partitioner in constructor args. Use setters and make the hashPartitioner default. Also, should separate userPartitioner/numUserBlocks and productPartitioner/numProductBlocks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11631405
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
private var lambda: Double,
private var implicitPrefs: Boolean,
private var alpha: Double,
+ private var partitioner: Partitioner = null,
--- End diff --
I'll do this, but note that this adds considerable functionality above what's described in the bug I'm supposed to address.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11693495
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ var partitioner: Partitioner = null
--- End diff --
Make it private?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40525634
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425586
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795349
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
--- End diff --
If it makes the line too wide, change the code to
~~~
val ratings = sc.parallelize(data._1.map { case Rating(u, p, r) =>
Rating(u - 25, p - 25, r)
})
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40884380
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14263/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909001
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14272/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795330
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ private var partitioner: Partitioner = null
--- End diff --
I agree that that would be nice. However, the number of blocks isn't necessarily known at instantiation time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40884376
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-41073792
Thanks, merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795406
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
+ val correct = data._2
+ val model = ALS.train(ratings, 5, 15)
+
+ val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
+ val ans = model.predict(sc.parallelize(pairs)).collect
+ ans.foreach { r =>
+ val u = r.user + 25
+ val p = r.product + 25
+ val v = r.rating
+ val error = v - correct.get(u, p)
+ assert(math.abs(error) < 0.4)
--- End diff --
That's what testALS() does. Monkey see, monkey do. (I can compute RMSE instead, of course; but any fragility here you'll also find in testALS.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40438446
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11631197
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +169,24 @@ class ALS private (
this.numBlocks
}
- val partitioner = new HashPartitioner(numBlocks)
+ // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
+ def hash(x: Int): Int = {
--- End diff --
That hash function was already there. I moved it up a few lines. I can change the hash function as well if you like...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40970396
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795344
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
--- End diff --
OK; I'll remove the null case. % 2 was intentional; this test may as well cover the last-partition-is-empty case as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795373
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
+ val correct = data._2
+ val model = ALS.train(ratings, 5, 15)
+
+ val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
+ val ans = model.predict(sc.parallelize(pairs)).collect
--- End diff --
`collect` -> `collect()` because it is might change the internal state.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40911258
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14273/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40624603
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40624591
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795309
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ private var partitioner: Partitioner = null
--- End diff --
You should assign the default partitioner directly to make the code easy to read instead of assign it in `run()`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795338
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
--- End diff --
Put a space after `,`. Put spaces around `-`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883634
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40595739
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14174/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11693684
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = data._1.toArray
+ val correct = data._2
+ for (i <- 0 until ratings.length) {
--- End diff --
You can use pattern matching to simplify this block:
~~~
val ratings = data._1.map { case Rating(i, j, r) =>
Rating(i - 25, j - 25, r)
}
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40595738
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40974245
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40884375
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40637948
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14181/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40909366
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14271/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40438216
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40970418
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40533843
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14153/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40630943
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14180/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40622319
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795347
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -114,6 +116,14 @@ class ALS private (
this
}
+ private var partitioner: Partitioner = null
+
+ /** Sets the Partitioner that partitions users. */
--- End diff --
Whoops.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40533842
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40908961
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40622331
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40637946
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40624807
Github thought some comments are outdated but they are not. Please click `show outdated diff` to see my comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40438517
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40626733
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14178/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883661
Handled most of your comments. I didn't change the assert in the "negative ids" test and I left the partitioner null handling as is. I don't see a nicer way to handle the latter at the moment.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40633267
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11694640
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
@@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
assert(u11 != u2)
}
+ test("custom partitioner") {
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
+ testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
+ def numPartitions(): Int = 3
+ def getPartition(x: Any): Int = x match {
+ case null => 0
+ case _ => x.hashCode % 2
+ }
+ })
+ }
+
+ test("negative ids") {
+ val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
+ val ratings = data._1.toArray
+ val correct = data._2
+ for (i <- 0 until ratings.length) {
+ var u = ratings(i).user
+ var p = ratings(i).product
+ var r = ratings(i).rating
+ u = u - 25
+ p = p - 25
+ ratings(i) = new Rating(u,p,r)
+ }
+ val ratingsRDD = sc.parallelize(ratings)
+
+ val model = ALS.train(ratingsRDD, 5, 15)
+
+ var pairs = new Array[(Int, Int)](0)
+ for (u <- -25 until 25; p <- -25 until 25) {
+ pairs = pairs :+ (u, p)
+ }
+ val ans = model.predict(sc.parallelize(pairs)).collect
+ for (r <- ans) {
--- End diff --
Use `foreach` instead of `for`. The former is faster.
~~~
ans.foreach { r =>
...
}
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40592198
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40626274
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425685
Merged build finished.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40561969
@tmyklebu Could you add a test with negative user or product ids? You can use the data generator in ALSSuite and flip the signs of user and product ids.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40910729
@tmyklebu Which commits do you not want in this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/407#discussion_r11795323
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -167,11 +177,23 @@ class ALS private (
this.numBlocks
}
- val partitioner = new HashPartitioner(numBlocks)
+ val defaultPartitioner = new Partitioner {
+ val numPartitions = numBlocks
+
+ def getPartition(x: Any): Int = x match {
+ case null => 0
--- End diff --
We know we are partitioning integers, so we don't need pattern matching. This function implementation can be changed to
~~~
Utils.nonNegativeMod(byteswap32(x.asInstanceOf[Int]), numPartitions)
~~~
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40908559
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40974247
All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14298/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40425598
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40883632
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40911257
Merged build finished. All automated tests passed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40908962
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/407#issuecomment-40911136
Right, that's essentially how to do it: A forced push from a local checkout that is in the correct state. So, if you originally made the PR from a branch called `PRbranch` and subsequently pushed a commit from that branch that you shouldn't have, then one fix procedure would be to do the following starting from your up-to-date checkout of `PRbranch`:
git checkout -b Keeper
git checkout PRbranch
git reset --hard `SHA of the last commit that is supposed to be in the PR`
git push origin +PRbranch
After that, your local checkout, github repo, and the PR should all agree that `PRbranch` contains only the correct commits; and the new `Keeper` branch in your local repo has everything that is in `PRBranch` plus the extra commits.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---