You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vrilleup <gi...@git.apache.org> on 2014/07/12 01:19:36 UTC

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

GitHub user vrilleup opened a pull request:

    https://github.com/apache/spark/pull/1378

    use specialized axpy in RowMatrix for SVD

    After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vrilleup/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1378
    
----
commit e1db950e91c7d9526519626aa252cd711307d857
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T01:05:18Z

    SPARK-1782: svd for sparse matrix using ARPACK
    
    copy ARPACK dsaupd/dseupd code from latest breeze
    change RowMatrix to use sparse SVD
    change tests for sparse SVD

commit 96d2ecb837843651db70d7505ddb73cfc0b0bf9a
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T06:03:35Z

    improve eigenvalue sorting

commit fe983b0e7d62359275a92c2adaae8a635d7dd5d8
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T07:01:29Z

    improve scala style

commit 9c8051594a88b53ce83b39b127a098b31bd89aad
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T08:25:58Z

    use non-sparse implementation when k = n

commit 827411b7a7c7a44ec9cf0a3a3439bba0a47575f7
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T08:29:12Z

    fix EOF new line

commit e7850ed465ceadd6a45132935013292a4845f8df
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-04T23:56:26Z

    use aggregate and axpy

commit 4c7aec3d1c5203b4825047c66bed718211f9446c
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-07T01:33:47Z

    improve comments

commit eb15100052aae878552aa437c41e548243a6a29e
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-13T06:36:18Z

    fix binary compatibility

commit 819824b85acfc8ace9c15e0a9c5ce317604e4f73
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-18T02:11:53Z

    add flag for dense svd or sparse svd

commit 5543cce3b7eba1bb3c4b5b8b43ca2c0399295044
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-23T23:27:27Z

    improve svd api

commit 71484263409c03669be825b50714731fa9c46f6c
Author: Li Pu <lp...@twitter.com>
Date:   2014-06-26T07:09:48Z

    improve RowMatrix multiply

commit c2737714b696d3cfae3b1efd0bde6a8d44a47b95
Author: Li Pu <lp...@twitter.com>
Date:   2014-07-07T20:49:29Z

    automatically determine SVD compute mode and parameters

commit 62969fa4e06a715025483ed282b29427075bbbf1
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-07-09T00:54:54Z

    use BDV directly in symmetricEigs
    change the computation mode to local-svd, local-eigs, and dist-eigs
    update tests and docs

commit 861ec48bc74616b47d45ad3b828097a35045050f
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-07-09T01:09:23Z

    simplify axpy

commit a461082d98828501eccfbb59c8813c5fbd2ef826
Author: Xiangrui Meng <me...@databricks.com>
Date:   2014-07-09T01:43:18Z

    make superscript show up correctly in doc

commit 4c618e917607b6d760f6192878173198399302c1
Author: Li Pu <li...@outlook.com>
Date:   2014-07-09T07:10:14Z

    Merge pull request #1 from mengxr/vrilleup-master
    
    Some updates to SVD impl

commit 7312ec10b1be13a41e46c4b8d164302c8497514a
Author: Li Pu <lp...@twitter.com>
Date:   2014-07-09T07:35:20Z

    very minor comment fix

commit 5255f2a23ae979dcf809034bba658491ab8fd72a
Author: Li Pu <lp...@twitter.com>
Date:   2014-07-10T18:53:06Z

    Merge remote-tracking branch 'upstream/master'

commit 6fb01a31ad967b849f5b738f22a64f8616d3177b
Author: Li Pu <lp...@twitter.com>
Date:   2014-07-11T23:12:43Z

    use specialized axpy in RowMatrix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48796422
  
    QA tests have started for PR 1378. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16580/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1378


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by vrilleup <gi...@git.apache.org>.
Github user vrilleup commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48794029
  
    hmmm, I did sync with the upstream branch before committing the last change, it seems that the whole commit history is still there...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48796259
  
    Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48803863
  
    Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48796267
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48803852
  
    @vrilleup We squash commits before merging a PR. The commit history show up since you used your master branch for this PR but apache/master doesn't have those commits.
    
    The change looks good to me. Thanks for testing the performance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48799322
  
    QA results for PR 1378:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16580/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48793817
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD

Posted by vrilleup <gi...@git.apache.org>.
Github user vrilleup commented on the pull request:

    https://github.com/apache/spark/pull/1378#issuecomment-48805362
  
    @mengxr thank you for merging the change!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---