You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tmyklebu <gi...@git.apache.org> on 2014/04/15 00:18:49 UTC

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

GitHub user tmyklebu opened a pull request:

    https://github.com/apache/spark/pull/407

    [SPARK-1281] Improve partitioning in ALS

    ALS was using HashPartitioner and explicit uses of `%` together.  Further, the naked use of `%` meant that, if the number of partitions corresponded with the stride of arithmetic progressions appearing in user and product ids, users and products could be mapped into buckets in an unfair or unwise way.
    
    This pull request:
    1) Makes the Partitioner an instance variable of ALS.
    2) Replaces the direct uses of `%` with calls to a Partitioner.
    3) Defines an anonymous Partitioner that scrambles the bits of the object's hashCode before reducing to the number of present buckets.
    
    This pull request does not make the partitioner user-configurable.
    
    I'm not all that happy about the way I did (1).  It introduces an icky lifetime issue and dances around it by nulling something.  However, I don't know a better way to make the partitioner visible everywhere it needs to be visible.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tmyklebu/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/407.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #407
    
----
commit c774d7d4bff91c9387d059d1189799fa0ff1f4b0
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-14T22:01:18Z

    Make the partitioner a member variable and use it instead of modding directly.

commit c90b6d8e91f86cf89adf28de6f9185647c87e5c8
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-14T22:10:30Z

    Scramble user and product ids before bucketing.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40909365
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40633279
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11693748
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = data._1.toArray
    +    val correct = data._2
    +    for (i <- 0 until ratings.length) {
    +      var u = ratings(i).user
    +      var p = ratings(i).product
    +      var r = ratings(i).rating
    +      u = u - 25
    +      p = p - 25
    +      ratings(i) = new Rating(u,p,r)
    +    }
    +    val ratingsRDD = sc.parallelize(ratings)
    +
    +    val model = ALS.train(ratingsRDD, 5, 15)
    +
    +    var pairs = new Array[(Int, Int)](0)
    --- End diff --
    
    Same trick here:
    ~~~
    val pairs = ratings.map { case Rating(i, j, r) =>
      (i, j)
    }
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40425686
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14120/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40438453
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11643157
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -96,6 +97,7 @@ class ALS private (
         private var lambda: Double,
         private var implicitPrefs: Boolean,
         private var alpha: Double,
    +    private var partitioner: Partitioner = null,
    --- End diff --
    
    Okay, let's make it simpler. Just a single partitioner for both users and products, customizable via a setter function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-41073331
  
    LGTM. @tmyklebu Thanks for fixing the partitioning in ALS!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11802224
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  private var partitioner: Partitioner = null
    --- End diff --
    
    Since partitioner is not configurable now, we don't need a private member. Setting the partitioner directly in `run` should suffice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795354
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    +    val correct = data._2
    +    val model = ALS.train(ratings, 5, 15)
    +
    +    val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
    --- End diff --
    
    space after `,` and spaces around `-`.
    
    If you are not sure about the style, please follow https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide or check the code base for similar usages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795397
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    +    val correct = data._2
    +    val model = ALS.train(ratings, 5, 15)
    +
    +    val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
    +    val ans = model.predict(sc.parallelize(pairs)).collect
    +    ans.foreach { r =>
    +      val u = r.user + 25
    +      val p = r.product + 25
    +      val v = r.rating
    +      val error = v - correct.get(u, p)
    +      assert(math.abs(error) < 0.4)
    --- End diff --
    
    We shouldn't assert on each prediction. One outlier could fail the task. Instead, compute RMSE and put an assertion on it. Or you can modify `testALS` by adding `negativeIds` to its arguments and use it in this test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40626732
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40629414
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40909000
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40884378
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14264/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40910803
  
    @markhamstra The NNLS-related ones.  I think I made them disappear, though, by git push --force'ing an old copy of my master branch (that happened to contain exactly the commits i did want in).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40630941
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40883719
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40910281
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40525624
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11694575
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = data._1.toArray
    +    val correct = data._2
    +    for (i <- 0 until ratings.length) {
    +      var u = ratings(i).user
    +      var p = ratings(i).product
    +      var r = ratings(i).rating
    +      u = u - 25
    +      p = p - 25
    +      ratings(i) = new Rating(u,p,r)
    +    }
    +    val ratingsRDD = sc.parallelize(ratings)
    +
    +    val model = ALS.train(ratingsRDD, 5, 15)
    +
    +    var pairs = new Array[(Int, Int)](0)
    --- End diff --
    
    ~~~
    val pairs = Array.tabulate(50, 50)((u, p) => (u - 25, p - 25)).flatten
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40888823
  
    Since we don’t separate user partitioner and product partitioner. With little chance a user can set a different partitioner that works magically better than the default, unless the user ids are separated from product ids, so the partitioner knows whether this is a user or a product. Maybe we should not let the partitioner be configurable. 'setNumBlocks' should update partitioner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40909034
  
    Ugh.  Why are those commits part of this PR now?  How do I undo this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40438518
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14128/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/407


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11653978
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -167,11 +169,24 @@ class ALS private (
           this.numBlocks
         }
     
    -    val partitioner = new HashPartitioner(numBlocks)
    +    // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
    +    def hash(x: Int): Int = {
    --- End diff --
    
    OK; I changed all instances of the hash function to byteswap32.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40883723
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40908556
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40629415
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14179/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11643032
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -167,11 +169,24 @@ class ALS private (
           this.numBlocks
         }
     
    -    val partitioner = new HashPartitioner(numBlocks)
    +    // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
    +    def hash(x: Int): Int = {
    --- End diff --
    
    Yes, let's change it. We need a fast hash function but not necessarily high quality. Using an existing implementation can simplify the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795333
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    --- End diff --
    
    Ditto. There is no null case. Should return `x.asInstanceOf[Int] % 2` directly. Btw, should it be `% 3` to have three partitions? Or you want to test with an empty partition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40592207
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11694652
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = data._1.toArray
    +    val correct = data._2
    +    for (i <- 0 until ratings.length) {
    +      var u = ratings(i).user
    +      var p = ratings(i).product
    +      var r = ratings(i).rating
    +      u = u - 25
    +      p = p - 25
    +      ratings(i) = new Rating(u,p,r)
    +    }
    +    val ratingsRDD = sc.parallelize(ratings)
    +
    +    val model = ALS.train(ratingsRDD, 5, 15)
    +
    +    var pairs = new Array[(Int, Int)](0)
    +    for (u <- -25 until 25; p <- -25 until 25) {
    +      pairs = pairs :+ (u, p)
    +    }
    +    val ans = model.predict(sc.parallelize(pairs)).collect
    +    for (r <- ans) {
    +      val u = r.user + 25
    +      val p = r.product + 25
    +      val v = r.rating
    +      val error = v - correct.get(u,p)
    --- End diff --
    
    Put a space after `,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40626298
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40425865
  
    Build failure.  Looks like a config issue in Jenkins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795311
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  private var partitioner: Partitioner = null
    +
    +  /** Sets the Partitioner that partitions users. */
    --- End diff --
    
    The same partitioner is also used in partitioning products.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11617633
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -167,11 +169,24 @@ class ALS private (
           this.numBlocks
         }
     
    -    val partitioner = new HashPartitioner(numBlocks)
    +    // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
    +    def hash(x: Int): Int = {
    --- End diff --
    
    Instead of implementing a hash function, shall we use some existing ones? For example, we may use `scala.util.hashing.byteswap32`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40910282
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11617460
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -96,6 +97,7 @@ class ALS private (
         private var lambda: Double,
         private var implicitPrefs: Boolean,
         private var alpha: Double,
    +    private var partitioner: Partitioner = null,
    --- End diff --
    
    Do not put partitioner in constructor args. Use setters and make the hashPartitioner default. Also, should separate userPartitioner/numUserBlocks and productPartitioner/numProductBlocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11631405
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -96,6 +97,7 @@ class ALS private (
         private var lambda: Double,
         private var implicitPrefs: Boolean,
         private var alpha: Double,
    +    private var partitioner: Partitioner = null,
    --- End diff --
    
    I'll do this, but note that this adds considerable functionality above what's described in the bug I'm supposed to address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11693495
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  var partitioner: Partitioner = null
    --- End diff --
    
    Make it private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40525634
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40425586
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795349
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    --- End diff --
    
    If it makes the line too wide, change the code to
    
    ~~~
      val ratings = sc.parallelize(data._1.map { case Rating(u, p, r) => 
          Rating(u - 25, p - 25, r)
        })
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40884380
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14263/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40909001
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14272/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795330
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  private var partitioner: Partitioner = null
    --- End diff --
    
    I agree that that would be nice.  However, the number of blocks isn't necessarily known at instantiation time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40884376
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-41073792
  
    Thanks, merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795406
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    +    val correct = data._2
    +    val model = ALS.train(ratings, 5, 15)
    +
    +    val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
    +    val ans = model.predict(sc.parallelize(pairs)).collect
    +    ans.foreach { r =>
    +      val u = r.user + 25
    +      val p = r.product + 25
    +      val v = r.rating
    +      val error = v - correct.get(u, p)
    +      assert(math.abs(error) < 0.4)
    --- End diff --
    
    That's what testALS() does.  Monkey see, monkey do.  (I can compute RMSE instead, of course; but any fragility here you'll also find in testALS.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40438446
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11631197
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -167,11 +169,24 @@ class ALS private (
           this.numBlocks
         }
     
    -    val partitioner = new HashPartitioner(numBlocks)
    +    // Hash an integer to propagate random bits at all positions, similar to java.util.HashTable
    +    def hash(x: Int): Int = {
    --- End diff --
    
    That hash function was already there.  I moved it up a few lines.  I can change the hash function as well if you like...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40970396
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795344
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    --- End diff --
    
    OK; I'll remove the null case.  % 2 was intentional; this test may as well cover the last-partition-is-empty case as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795373
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    +    val correct = data._2
    +    val model = ALS.train(ratings, 5, 15)
    +
    +    val pairs = Array.tabulate(50, 50)((u,p) => (u-25,p-25)).flatten
    +    val ans = model.predict(sc.parallelize(pairs)).collect
    --- End diff --
    
    `collect` -> `collect()` because it is might change the internal state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40911258
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14273/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40624603
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40624591
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795309
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  private var partitioner: Partitioner = null
    --- End diff --
    
    You should assign the default partitioner directly to make the code easy to read instead of assign it in `run()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795338
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,34 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = sc.parallelize(data._1.map { case Rating(u,p,r) => Rating(u-25,p-25,r) })
    --- End diff --
    
    Put a space after `,`. Put spaces around `-`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40883634
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40595739
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14174/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11693684
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = data._1.toArray
    +    val correct = data._2
    +    for (i <- 0 until ratings.length) {
    --- End diff --
    
    You can use pattern matching to simplify this block:
    ~~~
    val ratings = data._1.map { case Rating(i, j, r) =>
      Rating(i - 25, j - 25, r)
    }
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40595738
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40974245
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40884375
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40637948
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14181/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40909366
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14271/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40438216
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40970418
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40533843
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14153/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40630943
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14180/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40622319
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795347
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -114,6 +116,14 @@ class ALS private (
         this
       }
     
    +  private var partitioner: Partitioner = null
    +
    +  /** Sets the Partitioner that partitions users. */
    --- End diff --
    
    Whoops.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40533842
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40908961
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40622331
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40637946
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40624807
  
    Github thought some comments are outdated but they are not. Please click `show outdated diff` to see my comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40438517
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40626733
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14178/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40883661
  
    Handled most of your comments.  I didn't change the assert in the "negative ids" test and I left the partitioner null handling as is.  I don't see a nicer way to handle the latter at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40633267
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11694640
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala ---
    @@ -128,6 +128,47 @@ class ALSSuite extends FunSuite with LocalSparkContext {
         assert(u11 != u2)
       }
     
    +  test("custom partitioner") {
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, null)
    +    testALS(50, 50, 2, 15, 0.7, 0.3, false, false, false, 3, new Partitioner {
    +      def numPartitions(): Int = 3
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    +        case _ => x.hashCode % 2
    +      }
    +    })
    +  }
    +
    +  test("negative ids") {
    +    val data = ALSSuite.generateRatings(50, 50, 2, 0.7, false, false)
    +    val ratings = data._1.toArray
    +    val correct = data._2
    +    for (i <- 0 until ratings.length) {
    +      var u = ratings(i).user
    +      var p = ratings(i).product
    +      var r = ratings(i).rating
    +      u = u - 25
    +      p = p - 25
    +      ratings(i) = new Rating(u,p,r)
    +    }
    +    val ratingsRDD = sc.parallelize(ratings)
    +
    +    val model = ALS.train(ratingsRDD, 5, 15)
    +
    +    var pairs = new Array[(Int, Int)](0)
    +    for (u <- -25 until 25; p <- -25 until 25) {
    +      pairs = pairs :+ (u, p)
    +    }
    +    val ans = model.predict(sc.parallelize(pairs)).collect
    +    for (r <- ans) {
    --- End diff --
    
    Use `foreach` instead of `for`. The former is faster.
    
    ~~~
    ans.foreach { r => 
    ...
    }
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40592198
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40626274
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40425685
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40561969
  
    @tmyklebu Could you add a test with negative user or product ids? You can use the data generator in ALSSuite and flip the signs of user and product ids.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40910729
  
    @tmyklebu Which commits do you not want in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/407#discussion_r11795323
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -167,11 +177,23 @@ class ALS private (
           this.numBlocks
         }
     
    -    val partitioner = new HashPartitioner(numBlocks)
    +    val defaultPartitioner = new Partitioner {
    +      val numPartitions = numBlocks
    +
    +      def getPartition(x: Any): Int = x match {
    +        case null => 0
    --- End diff --
    
    We know we are partitioning integers, so we don't need pattern matching. This function implementation can be changed to
    
    ~~~
    Utils.nonNegativeMod(byteswap32(x.asInstanceOf[Int]), numPartitions)
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40908559
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40974247
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14298/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40425598
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40883632
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40911257
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40908962
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/407#issuecomment-40911136
  
    Right, that's essentially how to do it: A forced push from a local checkout that is in the correct state.  So, if you originally made the PR from a branch called `PRbranch` and subsequently pushed a commit from that branch that you shouldn't have, then one fix procedure would be to do the following starting from your up-to-date checkout of `PRbranch`:
    
    git checkout -b Keeper
    git checkout PRbranch
    git reset --hard `SHA of the last commit that is supposed to be in the PR`
    git push origin +PRbranch
    
    After that, your local checkout, github repo, and the PR should all agree that `PRbranch` contains only the correct commits; and the new `Keeper` branch in your local repo has everything that is in `PRBranch` plus the extra commits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---