You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tmyklebu <gi...@git.apache.org> on 2014/04/18 06:11:28 UTC

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

GitHub user tmyklebu opened a pull request:

    https://github.com/apache/spark/pull/442

    ALS: Avoid the garbage-creating ctor of DoubleMatrix

    `new DoubleMatrix(double[])` creates a garbage `double[]` of the same length as its argument and immediately throws it away.  This pull request avoids that constructor in the ALS code.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tmyklebu/spark foo2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #442
    
----
commit a09904f4e50ad5f15308c5bf515e4dd49f5ba718
Author: Tor Myklebust <tm...@gmail.com>
Date:   2014-04-18T03:55:01Z

    Helper function for wrapping Array[Double]'s with DoubleMatrix's.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

Posted by mateiz <gi...@git.apache.org>.
Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40882048
  
    Merged into master and branch 1.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40814591
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/442


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40786793
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40879990
  
    I doubt Hotspot can remove this allocation, but am not sure. It would have to do a couple things -- inline the call to 2 other constructors, realize there's no threading issue in the constructor, realize the dead store. I think it's worth avoiding manually here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40794622
  
    @tmyklebu Thanks for fixing this! Could you create a JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40785546
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40811701
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40879543
  
    @srowen: Does Hotspot actually generate code for the allocation and the dead store with the bad ctor?  I haven't picked through it yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40811687
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40826930
  
    [SPARK-1535] describes the issue and the form of the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/442#discussion_r11766593
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -303,6 +303,13 @@ class ALS private (
       }
     
       /**
    +   * Wrap a double array in a DoubleMatrix without creating garbage.
    +   */
    +  private def wrapDoubleArray(v: Array[Double]): DoubleMatrix = {
    +    new DoubleMatrix(v.length, 1, v:_*)
    --- End diff --
    
    put a space after `:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40845513
  
    @tmyklebu Could you add the JIRA number to the title of this PR? It makes life easier for people who merge the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1535] ALS: Avoid the garbage-creating c...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40879445
  
    LGTM. Thanks for fixing it! @matei Could you help merge this into both master and branch-1.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40788640
  
    +1 BTW I submitted a fix for this in jblas, although it is not yet released.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40814592
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14238/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40786797
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14226/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40785554
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/442#discussion_r11766710
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
    @@ -303,6 +303,13 @@ class ALS private (
       }
     
       /**
    +   * Wrap a double array in a DoubleMatrix without creating garbage.
    +   */
    +  private def wrapDoubleArray(v: Array[Double]): DoubleMatrix = {
    +    new DoubleMatrix(v.length, 1, v:_*)
    --- End diff --
    
    (PS might also note in the comment that it's safe to go back to the old DoubleMatrix constructor at jblas 1.2.4 or later.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: ALS: Avoid the garbage-creating ctor of Double...

Posted by tmyklebu <gi...@git.apache.org>.
Github user tmyklebu commented on the pull request:

    https://github.com/apache/spark/pull/442#issuecomment-40787028
  
    This appears to be a PySpark error unrelated to my change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---