You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/05/04 02:01:23 UTC
[jira] [Commented] (MAHOUT-1837) Sparse/Dense Matrix analysis for
Matrix Multiplication
[ https://issues.apache.org/jira/browse/MAHOUT-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15269875#comment-15269875 ]
ASF GitHub Bot commented on MAHOUT-1837:
----------------------------------------
Github user andrewpalumbo commented on a diff in the pull request:
https://github.com/apache/mahout/pull/228#discussion_r61976030
--- Diff: math-scala/src/main/scala/org/apache/mahout/math/scalabindings/package.scala ---
@@ -410,4 +412,34 @@ package object scalabindings {
def dist(mxX: Matrix, mxY: Matrix): Matrix = sqDist(mxX, mxY) := sqrt _
+ /**
+ * Check the density of an in-core matrix based on supplied criteria.
+ *
+ * @param mxX The matrix to check density of.
+ * @param rowSparsityThreshold the proportion of the rows which must be dense.
+ * @param elementSparsityThreshold the prpoportion of the rows in the random sample of the matrix which must be dense.
+ * @param sample how moch of the matrix to sample.
+ */
+ def isMatrixDense(mxX: Matrix, rowSparsityThreshold: Double = .30, elementSparsityThreshold: Double = .30, sample: Double = .25): Boolean = {
+ val rand = RandomUtils.getRandom
+ val m = mxX.numRows()
+ val numRowToTest: Int = (sample * m).toInt
+
+ var numDenseRows: Int = 0
+
+ for (i <- 0 until numRowToTest) {
+ // select a row at random
+ val row: Vector = mxX(rand.nextInt(m), ::)
+ // check the sparsity of that rosw if it is greater than the set sparsity threshold count this row as dense
+ if (row.getNumNonZeroElements / row.size().toDouble > elementSparsityThreshold) {
+ numDenseRows = numDenseRows + 1
+ }
+ }
+
+ // return the number of denserows/tested rows > rowSparsityThreshold
+ numDenseRows/numRowToTest > rowSparsityThreshold
+ }
+
--- End diff --
@dlyubimov does this seem like a decent test for matrix Density? I've put in both an `elementSparsityThreshold` to determine if a Vector itself is sparse, and a `rowSparsityThreshold` as a threshold for the entire matrix. I've also added in a `Vector.mean()` method but am not sure if it is needed in this case.
> Sparse/Dense Matrix analysis for Matrix Multiplication
> ------------------------------------------------------
>
> Key: MAHOUT-1837
> URL: https://issues.apache.org/jira/browse/MAHOUT-1837
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.12.0
> Reporter: Andrew Palumbo
> Assignee: Andrew Palumbo
> Fix For: 0.12.1
>
>
> In matrix multiplication, Sparse Matrices can easily turn dense and bloat memory, one fully dense column and one fully dense row can cause a sparse %*% sparse operation have a dense result.
> There are two issues here one with a quick Fix and one a bit more involved:
> # in {{ABt.Scala}} use check the `MatrixFlavor` of the combiner and use the flavor of the Block as the resulting Sparse or Dense matrix type:
> {code}
> val comb = if (block.getFlavor == MatrixFlavor.SPARSELIKE) {
> new SparseMatrix(prodNCol, block.nrow).t
> } else {
> new DenseMatrix(prodNCol, block.nrow).t
> }
> {code}
> a simlar check needs to be made in the {{blockify}} transformation.
>
> # More importantly, and more involved is to do an actual analysis of the resulting matrix data in the in-core {{mmul}} class and use a matrix of the appropriate Structure as a result.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)