You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Pasquinell Urbani <pa...@exalitica.com> on 2016/06/23 13:12:52 UTC
Change from distributed.MatrixEntry to Vector
Hello all,
I have to build a item-based recommendation system. First I obtained the
similarity matrix with CosineSimilarity DIMSUM by twitter solution (
https://blog.twitter.com/2014/all-pairs-similarity-via-dimsum). The
similarity matrix is in the following format:
org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.distributed.MatrixEntry].
The matrix name is simsEstimate, and is obtained from the following code:
val R = sc.parallelize(0 until M, NUMCHUNKS).flatMap{i =>
val inds = new scala.collection.mutable.TreeSet[Int]()
while (inds.size < NNZ) {
inds += scala.util.Random.nextInt(U)
}
inds.toArray.map(j => MatrixEntry(i, j, scala.math.random))
}
val mat = new CoordinateMatrix(R, M, U).toRowMatrix()
val simsEstimate = mat.columnSimilarities(0.8)
After this, I need to perform a ElementwiseProduct involving the columns
of the similarity matrix. But it needs to be
in org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] format.
Can anybody tell me how to manipulate a MatrixEntry format in order to
obtain their component vector to be in org.apache.spark.rdd.RDD[org.
apache.spark.mllib.linalg.Vector] format?