You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2014/06/07 06:50:01 UTC

[jira] [Comment Edited] (MAHOUT-1574) SparseRowMatrix needs performance improvement for times()

    [ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020692#comment-14020692 ] 

Dmitriy Lyubimov edited comment on MAHOUT-1574 at 6/7/14 4:48 AM:
------------------------------------------------------------------

I suggest to reopen: 

(1) first, an added test is failing in jenkins build. Not sure whether legitimately or not.
(2) 

{code}
     if (other instanceof SparseRowMatrix) {
     .... 
{code}

is a good first step but still a bit naive. Not sure it solves Sebastian's case. While SRM %\*% SRM is definitely one of the cases, there's also a handful of other cases (e.g. SRM %\*% SCM.t which is organizationally equivalent).

What i meant previously by absence of cost based approach for in-core matrices was more along the lines what Robin did the vector optimizations. Instead of asking "Are you imiplementation A" vector operations carry a cost-interrogating abstraction asking questions like "what is my cost traversing non-zeros? what is my cost for random access?" 

Also in current vector optimizations, this process is not a property (method) of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices could be interrogated along the lines "what is cost of row wise non-zero traverse", "what is the cost column-wise traverse", "what traversal plan bears best element locality" etc. etc.



was (Author: dlyubimov):
I suggest to reopen: 

(1) first, an added test is failing in jenkins build. Not sure whether legitimately or not.
(2) 

{code}
     if (other instanceof SparseRowMatrix) {
     .... 
{code}

is a good first step but still a bit naive. Not sure it solves Sebastian's case. While SRM %*% SRM is definitely one of the cases, there's also a handful of other cases (e.g. SRM %*% SCM.t which is organizationally equivalent).

What i meant previously by absence of cost based approach for in-core matrices was more along the lines what Robin did the vector optimizations. Instead of asking "Are you imiplementation A" vector operations carry a cost-interrogating abstraction asking questions like "what is my cost traversing non-zeros? what is my cost for random access?" 

Also in current vector optimizations, this process is not a property (method) of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices could be interrogated along the lines "what is cost of row wise non-zero traverse", "what is the cost column-wise traverse", "what traversal plan bears best element locality" etc. etc.


> SparseRowMatrix needs performance improvement for times()
> ---------------------------------------------------------
>
>                 Key: MAHOUT-1574
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1574
>             Project: Mahout
>          Issue Type: Bug
>            Reporter: Ted Dunning
>
> According to ssc,
> > * SparseRowMatrix with sequential vectors times SparseRowMatrix with
> > sequential vectors is totally broken, it uses three nested loops and uses
> > get(row, col) on the matrices, which internally uses binary search...



--
This message was sent by Atlassian JIRA
(v6.2#6252)