You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by srowen <gi...@git.apache.org> on 2014/02/21 17:34:19 UTC

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

GitHub user srowen opened a pull request:

    https://github.com/apache/incubator-spark/pull/629

    MLLIB-25: Implicit ALS runs out of memory for moderately large numbers of features

    There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In `ALS.scala:214`:
    
    ```
            factors.flatMapValues{ case factorArray =>
              factorArray.map{ vector =>
                val x = new DoubleMatrix(vector)
                x.mmul(x.transpose())
              }
            }.reduceByKeyLocally((a, b) => a.addi(b))
             .values
             .reduce((a, b) => a.addi(b))
    ```
    
    Completely correct, but there's a subtle but quite large memory problem here. map() is going to create all of these matrices in memory at once, when they don't need to ever all exist at the same time.
    For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product requires 32GB of heap. The computation will never work unless you can cough up workers with (more than) that much heap.
    
    Fortunately there's a trivial change that fixes it; just add `.view` in there.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/incubator-spark ALSMatrixAllocationOptimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/629.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #629
    
----
commit e9a5d636b8a5d6288924ddf2871645c4eea41ffe
Author: Sean Owen <so...@cloudera.com>
Date:   2014-02-21T09:37:39Z

    Avoid unnecessary out of memory situation by not simultaneously allocating lots of matrices

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35763013
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35750636
  
    LGTM if Travis passes (no reason not). Thanks for the fix! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/629#discussion_r9956734
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -212,7 +212,7 @@ class ALS private (var numBlocks: Int, var rank: Int, var iterations: Int, var l
         if (implicitPrefs) {
           Option(
             factors.flatMapValues{ case factorArray =>
    -          factorArray.map{ vector =>
    +          factorArray.view.map{ vector =>
    --- End diff --
    
    mind adding a space after map? technically not your problem, but while you are at it ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35756550
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-spark/pull/629


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35750760
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/629#discussion_r9956743
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -212,7 +212,7 @@ class ALS private (var numBlocks: Int, var rank: Int, var iterations: Int, var l
         if (implicitPrefs) {
           Option(
             factors.flatMapValues{ case factorArray =>
    -          factorArray.map{ vector =>
    +          factorArray.view.map{ vector =>
    --- End diff --
    
    and for the above flatMapValues


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35768223
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12804/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35771699
  
    Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35768220
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35763015
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35750759
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35776515
  
    Yup, good idea. I also cherry picked it into branch-0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/629#discussion_r9957753
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -212,7 +212,7 @@ class ALS private (var numBlocks: Int, var rank: Int, var iterations: Int, var l
         if (implicitPrefs) {
           Option(
             factors.flatMapValues{ case factorArray =>
    -          factorArray.map{ vector =>
    +          factorArray.view.map{ vector =>
    --- End diff --
    
    No problem, it's done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35775112
  
    @rxin would it make sense to backport this? Seems like a bug fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35835626
  
    @srowen good catch, thanks Sean. Didn't really think about this when I wrote it. Shows that testing on larger scale input data / params is always required!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: MLLIB-25: Implicit ALS runs out of m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/629#issuecomment-35756551
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12802/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---