You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jason Rennie (JIRA)" <ji...@apache.org> on 2008/03/07 05:38:58 UTC

[jira] Commented: (MAHOUT-6) Need a matrix implementation

    [ https://issues.apache.org/jira/browse/MAHOUT-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576034#action_12576034 ] 

Jason Rennie commented on MAHOUT-6:
-----------------------------------

Hmm... a HashMap SparseVector implementation is certainly flexible, but also quite inefficient both in terms of space and in terms of the basic vector/matrix operations (e.g. dot-product).  What about a (second?) representation as an int[] of indices and a double[] of value, where the indices are stored in sorted order?  This makes dot-products efficient and greatly reduces storage space.  'course, this makes get/set (very) slow, but I think the tradeoff is valuable.  At least, when I tested a HashMap implementation (might have been the colt one), it was completely impractical for my work (waaaaay too slow, IIRC).  The int[], double[] representation is what I use now and it serves me well.

Btw, since there likely to be multiple implementations of a SparseVector, can we rename SparseVector to SparseVectorHashMap or some such?


> Need a matrix implementation
> ----------------------------
>
>                 Key: MAHOUT-6
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-6
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-6a.diff, MAHOUT-6b.diff, MAHOUT-6c.diff, MAHOUT-6d.diff, MAHOUT-6e.diff, MAHOUT-6f.diff, MAHOUT-6g.diff, MAHOUT-6h.patch, MAHOUT-6i.diff, MAHOUT-6j.diff
>
>
> We need matrices for Mahout.
> An initial set of basic requirements includes:
> a) sparse and dense support are required
> b) row and column labels are important
> c) serialization for hadoop use is required
> d) reasonable floating point performance is required, but awesome FP is not
> e) the API should be simple enough to understand
> f) it should be easy to carve out sub-matrices for sending to different reducers
> g) a reasonable set of matrix operations should be supported, these should eventually include:
>     simple matrix-matrix and matrix-vector and matrix-scalar linear algebra operations, A B, A + B, A v, A + x, v + x, u + v, dot(u, v)
>     row and column sums  
>     generalized level 2 and 3 BLAS primitives, alpha A B + beta C and A u + beta v
> h) easy and efficient iteration constructs, especially for sparse matrices
> i) easy to extend with new implementations

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.