You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2014/07/01 18:51:15 UTC

[GitHub] spark pull request: SPARK-1782: svd for sparse matrix using ARPACK

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/964#issuecomment-47681084
  
    @vrilleup Just checked Matlab’s svd and svds. I don’t remember I have used options.{tol, maxit} before. I wonder whether this is useful to expose to users. I did use RCOND before because I needed to compute very accurate solution. But that work was purely academic. In MLlib’s implementation, we take the A^T A approach, which couldn’t give us very accurate small singular values if the matrix is ill-conditioned. So this is not useful either. My suggestion for the type signature is simply:
    
    ~~~
    def computeSVD(k: Int, computeU: Boolean)
    ~~~
    
    Let’s estimate the complexity of the dense approach and the iterative approach and decide which to use internally. We can open advanced options later, e.g. rcond, iter, method: {"dense", "arpack"}, etc. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---