You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Deron Eriksson (JIRA)" <ji...@apache.org> on 2016/08/18 19:44:20 UTC

[jira] [Commented] (SYSTEMML-220) New Second-Order Builtin Function 'apply'

    [ https://issues.apache.org/jira/browse/SYSTEMML-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15427040#comment-15427040 ] 

Deron Eriksson commented on SYSTEMML-220:
-----------------------------------------

[~mwdusenb@us.ibm.com] Did the subtasks get created for this JIRA?

> New Second-Order Builtin Function 'apply'
> -----------------------------------------
>
>                 Key: SYSTEMML-220
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-220
>             Project: SystemML
>          Issue Type: Task
>          Components: Compiler, Parser, Runtime
>            Reporter: Matthias Boehm
>   Original Estimate: 160h
>  Remaining Estimate: 160h
>
> In several scripts, there is a need to apply rather complex functions to each cell of a matrix. The most natural way of expressing that, especially if this function involves loops and branches, are DML-bodied functions over scalars and surrounding loops over all cells in the matrix. Below is an artificial example:
> {code}
> foo = function( Double in ) return ( Double out ){
> x = in^2;
> if( x > in*2 )
> x = x - in/3;
> out = x + 7;
> }
> R = matrix(0, rows=nrow(A), cols=ncol(A));
> for( i in 1:nrow(A) )
> for( j in 1:ncol(A) )
> R[i,j] = foo(A[i,j]) 
> {code}
> Especially, on large data, this would however cause severe performance problems. Accordingly, people usually "vectorize" these operations by hand which is unfortunately not too easy for very complex functions.
> {code}
> R = A^2 - ppred(A^2, A*2, ">")*A/3 + 7;
> {code}
> For this reason, we would like to integrate a second-order builtin function {{apply}} that would allow users to use their custom functions with reasonable performance. 
> {code}
> R = apply(A, foo);
> {code}
> We would initially constraint this builtin function to DML-bodied functions with (1) single scalar in / single scalar out, (2) no support for nested function invocations, (3) no creation of arbitrarily large intermediates (we assume small memory footprint per cell). These constraints would allow us to provide a very efficient new unary apply operation (multi-threaded in CP, narrow transformation in distributed backends). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)