You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jason Rennie (JIRA)" <ji...@apache.org> on 2008/03/07 05:38:58 UTC
[jira] Commented: (MAHOUT-6) Need a matrix implementation
[ https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576034#action_12576034 ]
Jason Rennie commented on MAHOUT-6:
-----------------------------------
Hmm... a HashMap SparseVector implementation is certainly flexible, but also quite inefficient both in terms of space and in terms of the basic vector/matrix operations (e.g. dot-product). What about a (second?) representation as an int[] of indices and a double[] of value, where the indices are stored in sorted order? This makes dot-products efficient and greatly reduces storage space. 'course, this makes get/set (very) slow, but I think the tradeoff is valuable. At least, when I tested a HashMap implementation (might have been the colt one), it was completely impractical for my work (waaaaay too slow, IIRC). The int[], double[] representation is what I use now and it serves me well.
Btw, since there likely to be multiple implementations of a SparseVector, can we rename SparseVector to SparseVectorHashMap or some such?
> Need a matrix implementation
> ----------------------------
>
> Key: MAHOUT-6
> URL: https://issues.apache.org/jira/browse/MAHOUT-6
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different reducers
> g) a reasonable set of matrix operations should be supported, these should eventually include:
> simple matrix-matrix and matrix-vector and matrix-scalar linear algebra operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
> row and column sums
> generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.