You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by buring <qy...@gmail.com> on 2014/09/29 15:58:52 UTC

The confusion order of rows in SVD matrix ?

Hi:
	I want to use SVD in my work. I tried some examples and have some
confusions. The input the 4*3 matrix as follows:
	2 0 0
	0 3 2
	0 3 1
	2 0 3
	My input file text as follows which is corresponding to the matrix
	0 0 2
	1 1 3
	1 2 2
	2 2 1
	2 1 3
	3 0 2
	3 2 3 
	After run the svd algorithm ,I tried to reCompute the input matrix through
U*S*V.T.But I found that ,The input matrix's row  is not as expected,I
printed it out:
	2 0 0
	0 3 1
	0 3 2
	2 0 3
	which rows 2 exchanged with rows 3,I confused on this.Can any one explain?

	My code is that :
	val inputData = sc.textFile(fname).map{
      line=>
        val parts = line.trim.split(' ')
        (parts(0).toLong,parts(1).toInt,parts(2).toDouble)
    }

    val dataRows = inputData.groupBy(_._1).map[(Long, Vector)]{ row =>
      val (indices, values) = row._2.map(e => (e._2, e._3)).unzip
      (row._1, new SparseVector(ncol, indices.toArray, values.toArray))
    }
    val data =
dataRows.take(dataRows.count().toInt).map(e=>e._1+e._2.toArray.mkString(";"))
    logInfo("----------------")
    logInfo(data(0))
    logInfo(data(1))
    logInfo(data(2))
    logInfo(data(3))

    Here is the print code ,and this matrix is the same as the reCompute
matrix U*S*V.T. But the order of rows has changed which I can't understand.I
want to have the same matrix as the input file. How to guarantee this?

    Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-confusion-order-of-rows-in-SVD-matrix-tp15337.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: The confusion order of rows in SVD matrix ?

Posted by Sean Owen <so...@cloudera.com>.
The RDD you define has no particular ordering. So the order that you
encounter the elements ("rows") with an operation like take or collect
isn't defined. You can try to sort the RDD by the row number before
that key is discarded.

On Mon, Sep 29, 2014 at 2:58 PM, buring <qy...@gmail.com> wrote:
> Hi:
>         I want to use SVD in my work. I tried some examples and have some
> confusions. The input the 4*3 matrix as follows:
>         2 0 0
>         0 3 2
>         0 3 1
>         2 0 3
>         My input file text as follows which is corresponding to the matrix
>         0 0 2
>         1 1 3
>         1 2 2
>         2 2 1
>         2 1 3
>         3 0 2
>         3 2 3
>         After run the svd algorithm ,I tried to reCompute the input matrix through
> U*S*V.T.But I found that ,The input matrix's row  is not as expected,I
> printed it out:
>         2 0 0
>         0 3 1
>         0 3 2
>         2 0 3
>         which rows 2 exchanged with rows 3,I confused on this.Can any one explain?
>
>         My code is that :
>         val inputData = sc.textFile(fname).map{
>       line=>
>         val parts = line.trim.split(' ')
>         (parts(0).toLong,parts(1).toInt,parts(2).toDouble)
>     }
>
>     val dataRows = inputData.groupBy(_._1).map[(Long, Vector)]{ row =>
>       val (indices, values) = row._2.map(e => (e._2, e._3)).unzip
>       (row._1, new SparseVector(ncol, indices.toArray, values.toArray))
>     }
>     val data =
> dataRows.take(dataRows.count().toInt).map(e=>e._1+e._2.toArray.mkString(";"))
>     logInfo("----------------")
>     logInfo(data(0))
>     logInfo(data(1))
>     logInfo(data(2))
>     logInfo(data(3))
>
>     Here is the print code ,and this matrix is the same as the reCompute
> matrix U*S*V.T. But the order of rows has changed which I can't understand.I
> want to have the same matrix as the input file. How to guarantee this?
>
>     Thanks!
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/The-confusion-order-of-rows-in-SVD-matrix-tp15337.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org