You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tmyklebu <gi...@git.apache.org> on 2014/04/26 21:26:23 UTC

[GitHub] spark pull request: Micro-optimisation of ALS

GitHub user tmyklebu opened a pull request:

    https://github.com/apache/spark/pull/568

    Micro-optimisation of ALS

    This change replaces some Scala `for` and `foreach` constructs with `while` constructs.  There may be a slight performance gain on the order of 1-2% when training an ALS model.
    
    I trained an ALS model on the Movielens 10M-rating dataset repeatedly both with and without these changes.  All 7 runs in both columns were done in a Scala `for` loop like this:
    
        for (iter <- 0 to 10) {
          val before = System.currentTimeMillis()
          val model = ALS.train(rats, 20, 10)
          val after = System.currentTimeMillis()
          println("%d ms".format(after-before))
          println("rmse %g".format(computeRmse(model, rats, numRatings)))
        }
    
    The timings were done on a multiuser machine, and I stopped one set of timings after 7 had been completed.  It would be nice if somebody with dedicated hardware could confirm my timings.
    
        After           Before
        121980 ms       122041 ms
        117069 ms       117127 ms
        115332 ms       117523 ms
        115381 ms       117402 ms
        114635 ms       116550 ms
        114140 ms       114076 ms
        112993 ms       117200 ms
    
    Ratios are about 1.0005, 1.0005, 1.019, 1.0175, 1.01671, 0.99944, and 1.03723.  I therefore suspect these changes make for a slight performance gain on the order of 1-2%.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tmyklebu/spark alsopt

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #568
    
----
commit c774d7d4bff91c9387d059d1189799fa0ff1f4b0
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-14T22:01:18Z

    Make the partitioner a member variable and use it instead of modding directly.

commit c90b6d8e91f86cf89adf28de6f9185647c87e5c8
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-14T22:10:30Z

    Scramble user and product ids before bucketing.

commit df27697649de50d364c42c76aebaebb34cbe87e2
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-15T19:47:17Z

    Support custom partitioners.  Currently we use the same partitioner for users and products.

commit d872b098d41c4fc088e579e8fe199aca149bca64
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T12:19:48Z

    Add negative id ALS test.

commit 36a0f43519a1e8ea800b960157f8c8b050139105
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T16:42:31Z

    Make the partitioner private.

commit 5ec9e6cd237c4ac7c1b597614c880ae75bacceee
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T17:00:39Z

    Clean a couple of things up using 'map'.

commit f8413451c807282100a9be506ca2c992abb81918
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T17:12:47Z

    Fix daft bug creating 'pairs', also for -> foreach.

commit 40edc235e59aab56d6f65c73ffe98859c78a889b
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-16T18:14:38Z

    Fix missing space.

commit 674933abb7a373dc1c913467d668bad9045e560f
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:36:52Z

    Fix style.

commit 495784f2a172957ab490e0f77ea504c0179ab798
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-19T23:41:23Z

    Merge branch 'master' of https://github.com/apache/spark

commit 23d6f91b52c88b7006ec78496f777b72b1881bb4
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-21T00:06:19Z

    Stop making the partitioner configurable.

commit dcf583ac4001c6da8d6b85e45e88043861a351d8
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-21T19:56:48Z

    Remove the partitioner member variable; instead, thread that needle everywhere it needs to go.

commit 114fb740c88da23372d2aaf6a12640d17b1408a1
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-25T23:19:33Z

    Turn some 'for' loops into 'while' loops.

commit 4ef0313c80425f51fd42fe47a4f3ff4060471d03
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-26T19:11:52Z

    Merge branch 'master' of github.com:apache/spark into alsopt

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478309
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478267
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41479481
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14514/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41479480
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080170
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -491,11 +494,17 @@ class ALS private (
                 if (rs(i) > 0) {
                   SimpleBlas.axpy(confidence, x, userXy(us(i)))
                 }
    -          } else {
    +            i = i + 1
    --- End diff --
    
    `i += 1` is more common in Spark code base.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080163
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -474,13 +474,16 @@ class ALS private (
         // Compute the XtX and Xy values for each user by adding products it rated in each product
         // block
         for (productBlock <- 0 until numBlocks) {
    -      for (p <- 0 until blockFactors(productBlock).length) {
    +      var i = 0
    +      var p = 0
    +      while (p < blockFactors(productBlock).length) {
             val x = wrapDoubleArray(blockFactors(productBlock)(p))
             tempXtX.fill(0.0)
             dspr(1.0, x, tempXtX)
             val (us, rs) = inLinkBlock.ratingsForBlock(productBlock)(p)
    -        for (i <- 0 until us.length) {
    -          if (implicitPrefs) {
    +        if (implicitPrefs) {
    +          i = 0
    --- End diff --
    
    `var i = 0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41645790
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14556/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080172
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -491,11 +494,17 @@ class ALS private (
                 if (rs(i) > 0) {
                   SimpleBlas.axpy(confidence, x, userXy(us(i)))
                 }
    -          } else {
    +            i = i + 1
    +          }
    +        } else {
    +          i = 0
    --- End diff --
    
    Ditto. `var i = 0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41749414
  
    LGTM. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478310
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14513/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080177
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -491,11 +494,17 @@ class ALS private (
                 if (rs(i) > 0) {
                   SimpleBlas.axpy(confidence, x, userXy(us(i)))
                 }
    -          } else {
    +            i = i + 1
    +          }
    +        } else {
    +          i = 0
    +          while (i < us.length) {
                 userXtX(us(i)).addi(tempXtX)
                 SimpleBlas.axpy(rs(i), x, userXy(us(i)))
    +            i = i + 1
               }
             }
    +        p = p + 1
    --- End diff --
    
    `p += 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080182
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -504,7 +513,11 @@ class ALS private (
           // Compute the full XtX matrix from the lower-triangular part we got above
           fillFullMatrix(userXtX(index), fullXtX)
           // Add regularization
    -      (0 until rank).foreach(i => fullXtX.data(i*rank + i) += lambda)
    +      var i = 0
    +      while (i < rank) {
    +        fullXtX.data(i * rank + i) += lambda
    +        i = i + 1
    --- End diff --
    
    `i += 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478262
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41761125
  
    Merged. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41644144
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478410
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41478408
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080175
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -491,11 +494,17 @@ class ALS private (
                 if (rs(i) > 0) {
                   SimpleBlas.axpy(confidence, x, userXy(us(i)))
                 }
    -          } else {
    +            i = i + 1
    +          }
    +        } else {
    +          i = 0
    +          while (i < us.length) {
                 userXtX(us(i)).addi(tempXtX)
                 SimpleBlas.axpy(rs(i), x, userXy(us(i)))
    +            i = i + 1
    --- End diff --
    
    `i += 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41645788
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/568


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/568#discussion_r12080160
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -474,13 +474,16 @@ class ALS private (
         // Compute the XtX and Xy values for each user by adding products it rated in each product
         // block
         for (productBlock <- 0 until numBlocks) {
    -      for (p <- 0 until blockFactors(productBlock).length) {
    +      var i = 0
    --- End diff --
    
    This `i` is a little far from the while loop that actually uses it. Maybe it is better to define it just before `while`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1646] Micro-optimisation of ALS

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/568#issuecomment-41644138
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---