You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vrilleup <gi...@git.apache.org> on 2014/07/12 01:19:36 UTC
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
GitHub user vrilleup opened a pull request:
https://github.com/apache/spark/pull/1378
use specialized axpy in RowMatrix for SVD
After running some more tests on large matrix, found that the BV axpy (breeze/linalg/Vector.scala, axpy) is slower than the BSV axpy (breeze/linalg/operators/SparseVectorOps.scala, sv_dv_axpy), 8s v.s. 2s for each multiplication. The BV axpy operates on an iterator while BSV axpy directly operates on the underlying array. I think the overhead comes from creating the iterator (with a zip) and advancing the pointers.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vrilleup/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1378.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1378
----
commit e1db950e91c7d9526519626aa252cd711307d857
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T01:05:18Z
SPARK-1782: svd for sparse matrix using ARPACK
copy ARPACK dsaupd/dseupd code from latest breeze
change RowMatrix to use sparse SVD
change tests for sparse SVD
commit 96d2ecb837843651db70d7505ddb73cfc0b0bf9a
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T06:03:35Z
improve eigenvalue sorting
commit fe983b0e7d62359275a92c2adaae8a635d7dd5d8
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T07:01:29Z
improve scala style
commit 9c8051594a88b53ce83b39b127a098b31bd89aad
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T08:25:58Z
use non-sparse implementation when k = n
commit 827411b7a7c7a44ec9cf0a3a3439bba0a47575f7
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T08:29:12Z
fix EOF new line
commit e7850ed465ceadd6a45132935013292a4845f8df
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-04T23:56:26Z
use aggregate and axpy
commit 4c7aec3d1c5203b4825047c66bed718211f9446c
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-07T01:33:47Z
improve comments
commit eb15100052aae878552aa437c41e548243a6a29e
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-13T06:36:18Z
fix binary compatibility
commit 819824b85acfc8ace9c15e0a9c5ce317604e4f73
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-18T02:11:53Z
add flag for dense svd or sparse svd
commit 5543cce3b7eba1bb3c4b5b8b43ca2c0399295044
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-23T23:27:27Z
improve svd api
commit 71484263409c03669be825b50714731fa9c46f6c
Author: Li Pu <lp...@twitter.com>
Date: 2014-06-26T07:09:48Z
improve RowMatrix multiply
commit c2737714b696d3cfae3b1efd0bde6a8d44a47b95
Author: Li Pu <lp...@twitter.com>
Date: 2014-07-07T20:49:29Z
automatically determine SVD compute mode and parameters
commit 62969fa4e06a715025483ed282b29427075bbbf1
Author: Xiangrui Meng <me...@databricks.com>
Date: 2014-07-09T00:54:54Z
use BDV directly in symmetricEigs
change the computation mode to local-svd, local-eigs, and dist-eigs
update tests and docs
commit 861ec48bc74616b47d45ad3b828097a35045050f
Author: Xiangrui Meng <me...@databricks.com>
Date: 2014-07-09T01:09:23Z
simplify axpy
commit a461082d98828501eccfbb59c8813c5fbd2ef826
Author: Xiangrui Meng <me...@databricks.com>
Date: 2014-07-09T01:43:18Z
make superscript show up correctly in doc
commit 4c618e917607b6d760f6192878173198399302c1
Author: Li Pu <li...@outlook.com>
Date: 2014-07-09T07:10:14Z
Merge pull request #1 from mengxr/vrilleup-master
Some updates to SVD impl
commit 7312ec10b1be13a41e46c4b8d164302c8497514a
Author: Li Pu <lp...@twitter.com>
Date: 2014-07-09T07:35:20Z
very minor comment fix
commit 5255f2a23ae979dcf809034bba658491ab8fd72a
Author: Li Pu <lp...@twitter.com>
Date: 2014-07-10T18:53:06Z
Merge remote-tracking branch 'upstream/master'
commit 6fb01a31ad967b849f5b738f22a64f8616d3177b
Author: Li Pu <lp...@twitter.com>
Date: 2014-07-11T23:12:43Z
use specialized axpy in RowMatrix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48796422
QA tests have started for PR 1378. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16580/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1378
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by vrilleup <gi...@git.apache.org>.
Github user vrilleup commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48794029
hmmm, I did sync with the upstream branch before committing the last change, it seems that the whole commit history is still there...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48796259
Jenkins, add to whitelist.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48803863
Merged.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48796267
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48803852
@vrilleup We squash commits before merging a PR. The commit history show up since you used your master branch for this PR but apache/master doesn't have those commits.
The change looks good to me. Thanks for testing the performance!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48799322
QA results for PR 1378:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16580/consoleFull
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48793817
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: use specialized axpy in RowMatrix for SVD
Posted by vrilleup <gi...@git.apache.org>.
Github user vrilleup commented on the pull request:
https://github.com/apache/spark/pull/1378#issuecomment-48805362
@mengxr thank you for merging the change!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---