You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mahout.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/10/03 01:29:00 UTC

[jira] [Commented] (MAHOUT-2019) SparseRowMatrix assign ops user for loops instead of iterateNonZero and so can be optimized

    [ https://issues.apache.org/jira/browse/MAHOUT-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189151#comment-16189151 ] 

ASF GitHub Bot commented on MAHOUT-2019:
----------------------------------------

GitHub user pferrel opened a pull request:

    https://github.com/apache/mahout/pull/342

    MAHOUT-2019 Sparse speedup

    ### Purpose of PR:
    to review an apparent speedup of spark-itemsimilarity and the underlying SimilarityAnalysis.cooccurrence by using an iterateNonZero instead of the previous for loops in SparseRowMatrix.
    
    For discussion only at present
    
    MAHOUT-2019
    https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2019?filter=allopenissues&orderby=priority+DESC%2C+updated+DESC
    
    ### Important ToDos
    Please mark each with an "x"
    - [x] A JIRA ticket exists (if not, please create this first)[https://issues.apache.org/jira/browse/ZEPPELIN/]
    - [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX is the JIRA number.
    - [ ] Created unit tests where appropriate
    - [ ] Added licenses correct on newly added files
    - [ ] Assigned JIRA to self
    - [ ] Added documentation in scala docs/java docs, and to website
    - [ ] Successfully built and ran all unit tests, verified that all tests pass locally.
    
    If all of these things aren't complete, but you still feel it is
    appropriate to open a PR, please add [WIP] after MAHOUT-XXXX before the
    descriptions- e.g. "MAHOUT-XXXX [WIP] Description of Change"
    
    Does this change break earlier versions?
    
    Is this the beginning of a larger project for which a feature branch should be made?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pferrel/mahout sparse-speedup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #342
    
----
commit 26a2efa65e9f09df358e1021ebf45e3735e2ec6c
Author: pferrel <pa...@occamsmachete.com>
Date:   2017-10-02T18:39:54Z

    minimum speedup fix

commit 9330a2ed6d1211459c57863a5d664377c55aa747
Author: pferrel <pa...@occamsmachete.com>
Date:   2017-10-02T19:27:47Z

    minimum speedup fix with cast exception check

commit 722bd11f01e7250f99f21f17ec7211bf5abb2089
Author: pferrel <pa...@occamsmachete.com>
Date:   2017-10-02T20:33:07Z

    added cast exception logging to SparseRowMatrix

commit 02700ef13c44e403cba58288dcbab5cfabed8585
Author: pferrel <pa...@occamsmachete.com>
Date:   2017-10-02T20:35:14Z

    Merge branch 'master' into sparse-speedup

----


> SparseRowMatrix assign ops user for loops instead of iterateNonZero and so can be optimized
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-2019
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2019
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.13.0
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 0.13.1
>
>
> DRMs get blockified into SparseRowMatrix instances if the density is low. But SRM inherits the implementation of method like "assign" from AbstractMatrix, which uses nest for loops to traverse rows. For multiplying 2 matrices that are extremely sparse, the kind if data you see in collaborative filtering, this is extremely wasteful of execution time. Better to use a sparse vector's iterateNonZero Iterator for some function types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)