You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Leonard Papenmeier (Jira)" <ji...@apache.org> on 2022/10/26 11:59:00 UTC
[jira] [Created] (SPARK-40920) SVD: matrix U has wrong column order

Leonard Papenmeier created SPARK-40920:
------------------------------------------

             Summary: SVD: matrix U has wrong column order
                 Key: SPARK-40920
                 URL: https://issues.apache.org/jira/browse/SPARK-40920
             Project: Spark
          Issue Type: Bug
          Components: MLlib, PySpark
    Affects Versions: 3.3.0
         Environment: Python 3.10, multi-core machine, no cluster
            Reporter: Leonard Papenmeier
         Attachments: image-2022-10-26-13-58-52-998.png

When performing SVD on a RowMatrix, the matrix U has the wrong row order and the original matrix is not correctly restored with the given matrix. 

 

Consider the following code:
{code:java}
x_np = np.random.random((14, 3)) # the size matters, it works for smaller sizes
x = ctx.parallelize(x_np).zipWithIndex().map(
    lambda r: [MatrixEntry(r[1], i, r[0][i]) for i in range(len(r[0]))])
x = CoordinateMatrix(x.flatMap(lambda x: x))
x_inv = matrix_inverse(x) {code}
with 
{code:java}
def matrix_inverse(matrix: CoordinateMatrix) -> DenseMatrix:
    mtrx = matrix.toRowMatrix()
    svd = matrix.toRowMatrix().computeSVD(k=mtrx.numCols(), computeU=True, rCond=1e-15)  # do the SVD

    s_inv = 1 / svd.s
    mtrx_orig = matrix.toBlockMatrix().blocks.first()[1].toArray()
    u_dense = mtrx_orig @ (svd.V.toArray() * s_inv[np.newaxis, :])
    cov_inv = np.matmul(svd.V.toArray(), np.multiply(s_inv[:, np.newaxis], u_dense.T))
    u_from_spark = np.array(svd.U.rows.map(lambda x: x.toArray()).collect())
    return DenseMatrix(numRows=cov_inv.shape[0], numCols=cov_inv.shape[1],
                       values=cov_inv.ravel(order="F"))  # return inverse as dense matrix {code}
Then, u_dense is the correct U but differs from the U produced by Spark. In particular, the U in Spark does not return the correct pseudoinverse and U@[S@V.T|mailto:S@V.T] does not reproduce the input matrix. 

 

With the following input matrix x

!image-2022-10-26-13-56-45-117.png!

I get the following u_dense

!image-2022-10-26-13-56-59-157.png!

but the following u_from_spark

!image-2022-10-26-13-57-15-396.png!

 

On careful inspection, it seems that the row order is wrong.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org