You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/03/09 19:58:40 UTC

[jira] [Resolved] (SYSTEMML-552) Performance features ALS-CG

     [ https://issues.apache.org/jira/browse/SYSTEMML-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Boehm resolved SYSTEMML-552.
-------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 0.10

> Performance features ALS-CG
> ---------------------------
>
>                 Key: SYSTEMML-552
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-552
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>             Fix For: SystemML 0.10
>
>
> Over a spectrum of data sizes, ALS-CG does not always perform as good as we would expect due to unnecessary overheads. This task captures related performance features:
> 1) Cache-conscious sparse wdivmm left/right: For large factors, the approach of iterating through non-zeros in W and computing dot products, leads to repeated (unnecessary) scans of the factors from main-memory. 
> 2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with !=0, there is already a special case which is however unnecessarily conservative. We should realize this with a plain memcopy of indices and memset 1 for values.
> 3) Flop-aware operator selection QuaternaryOp: For large ranks, all quaternary operators become really compute-intensive. In these situations, our heuristic of choosing ExecType.CP if the operation fits in driver memory does not work very well. Hence, we should take the number of floating point operations and the local/cluster degree of parallelism into account when deciding for the execution type.  
> 4) Improved parallel read sparse binary block: Reading sparse binary block matrices with clen>bclen requires a global lock on append and final sequential sorting of sparse rows. We should use a more fine-grained locking scheme and sort sparse rows in parallel. 
> 5) Cache-conscious sparse wsloss all patterns: Similar to wdivmm (see 1) but less common since only executed once per outer iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)