You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Andrew Palumbo <ap...@outlook.com> on 2016/09/08 15:16:26 UTC

SparseRowMatrices from dense matrix operations

@ssc

Re: SparseRowMatrices from dense operations, there are some operations that use `SparseRowMatrix` as the default for the accumulator in their combiners.  E.g.,

Spark ABt: https://github.com/apache/mahout/blob/master/spark/src/main/scala/org/apache/mahout/sparkbindings/blas/ABt.scala#L296


I believe that it was implemented this way so that in the worst case of over sized in-core Sparse %*% Dense matrix multiplication if the result was too large it would not throw an OOM error.    This is what we created the densityAnalaysis(..) method for, to detect the actual density of a matrix on the fly and to use the appropriate structure based on the data itself.


It is actually not being used in Spark ABt yet.  There is actually a Jira open to go through and use densityAnalysis() in all appropriate cases: https://issues.apache.org/jira/browse/MAHOUT-1873?filter=-1


So currently, ABt (and possibly some other operations) will return a `SparseRowMatrix` as a result of 2 dense matrices (if I'm reading it correctly).


It looks like this is a good candidate for densityAnalysis().