You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (JIRA)" <ji...@apache.org> on 2014/06/07 06:50:01 UTC
[jira] [Comment Edited] (MAHOUT-1574) SparseRowMatrix needs
performance improvement for times()
[ https://issues.apache.org/jira/browse/MAHOUT-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14020692#comment-14020692 ]
Dmitriy Lyubimov edited comment on MAHOUT-1574 at 6/7/14 4:48 AM:
------------------------------------------------------------------
I suggest to reopen:
(1) first, an added test is failing in jenkins build. Not sure whether legitimately or not.
(2)
{code}
if (other instanceof SparseRowMatrix) {
....
{code}
is a good first step but still a bit naive. Not sure it solves Sebastian's case. While SRM %\*% SRM is definitely one of the cases, there's also a handful of other cases (e.g. SRM %\*% SCM.t which is organizationally equivalent).
What i meant previously by absence of cost based approach for in-core matrices was more along the lines what Robin did the vector optimizations. Instead of asking "Are you imiplementation A" vector operations carry a cost-interrogating abstraction asking questions like "what is my cost traversing non-zeros? what is my cost for random access?"
Also in current vector optimizations, this process is not a property (method) of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices could be interrogated along the lines "what is cost of row wise non-zero traverse", "what is the cost column-wise traverse", "what traversal plan bears best element locality" etc. etc.
was (Author: dlyubimov):
I suggest to reopen:
(1) first, an added test is failing in jenkins build. Not sure whether legitimately or not.
(2)
{code}
if (other instanceof SparseRowMatrix) {
....
{code}
is a good first step but still a bit naive. Not sure it solves Sebastian's case. While SRM %*% SRM is definitely one of the cases, there's also a handful of other cases (e.g. SRM %*% SCM.t which is organizationally equivalent).
What i meant previously by absence of cost based approach for in-core matrices was more along the lines what Robin did the vector optimizations. Instead of asking "Are you imiplementation A" vector operations carry a cost-interrogating abstraction asking questions like "what is my cost traversing non-zeros? what is my cost for random access?"
Also in current vector optimizations, this process is not a property (method) of LHS, but rather interrogates both LHS and RHS. Similarly i guess matrices could be interrogated along the lines "what is cost of row wise non-zero traverse", "what is the cost column-wise traverse", "what traversal plan bears best element locality" etc. etc.
> SparseRowMatrix needs performance improvement for times()
> ---------------------------------------------------------
>
> Key: MAHOUT-1574
> URL: https://issues.apache.org/jira/browse/MAHOUT-1574
> Project: Mahout
> Issue Type: Bug
> Reporter: Ted Dunning
>
> According to ssc,
> > * SparseRowMatrix with sequential vectors times SparseRowMatrix with
> > sequential vectors is totally broken, it uses three nested loops and uses
> > get(row, col) on the matrices, which internally uses binary search...
--
This message was sent by Atlassian JIRA
(v6.2#6252)