You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Deron Eriksson (JIRA)" <ji...@apache.org> on 2017/06/29 21:31:00 UTC
[jira] [Closed] (SYSTEMML-1379) Investigate script metadata to simplify MLContext script interaction

     [ https://issues.apache.org/jira/browse/SYSTEMML-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Deron Eriksson closed SYSTEMML-1379.
------------------------------------

> Investigate script metadata to simplify MLContext script interaction
> --------------------------------------------------------------------
>
>                 Key: SYSTEMML-1379
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1379
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms, APIs
>            Reporter: Deron Eriksson
>            Assignee: Deron Eriksson
>             Fix For: Not Applicable
>
>
> Currently many scripts contain usage comments such as the following:
> {code}
> # THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOW-RANK MATRIX X INTO TWO MATRICES U AND V 
> # USING ALTERNATING-LEAST-SQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT 
> # MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH REGULARIZATION)
> #
> # INPUT   PARAMETERS:
> # ---------------------------------------------------------------------------------------------
> # NAME    TYPE     DEFAULT  MEANING
> # ---------------------------------------------------------------------------------------------
> # X       String   ---      Location to read the input matrix X to be factorized
> # U       String   ---      Location to write the factor matrix U
> # V       String   ---      Location to write the factor matrix V
> # rank    Int      10       Rank of the factorization
> # reg     String   "L2"	    Regularization: 
> #                           "L2" = L2 regularization;
> #                           "wL2" = weighted L2 regularization
> # lambda  Double   0.000001 Regularization parameter, no regularization if 0.0
> # maxi    Int      50       Maximum number of iterations
> # check   Boolean  FALSE    Check for convergence after every iteration, i.e., updating U and V once
> # thr     Double   0.0001   Assuming check is set to TRUE, the algorithm stops and convergence is declared 
> #                           if the decrease in loss in any two consecutive iterations falls below this threshold; 
> #                           if check is FALSE thr is ignored
> # fmt     String   "text"   The output format of the factor matrices L and R, such as "text" or "csv"
> # ---------------------------------------------------------------------------------------------
> # OUTPUT: 
> # 1- An m x r matrix U, where r is the factorization rank 
> # 2- An r x n matrix V
> #
> # HOW TO INVOKE THIS SCRIPT - EXAMPLE:
> # hadoop jar SystemML.jar -f ALS-CG.dml -nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U V=OUTPUT_DIR/V rank=10 reg="L2" lambda=0.0001 fmt=csv
> {code}
> Comments such as these are difficult to refer to from a programmatic interactive environment such as the Spark Shell. If similar information is provided in a parseable format, such as JSON or XML, it can potentially be parsed and used to provide such information programmatically, such as through the MLContext API in the Spark Shell.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)