You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Deron Eriksson (JIRA)" <ji...@apache.org> on 2017/03/09 19:52:38 UTC
[jira] [Created] (SYSTEMML-1379) Investigate script metadata to
simplify MLContext script interaction
Deron Eriksson created SYSTEMML-1379:
----------------------------------------
Summary: Investigate script metadata to simplify MLContext script interaction
Key: SYSTEMML-1379
URL: https://issues.apache.org/jira/browse/SYSTEMML-1379
Project: SystemML
Issue Type: Improvement
Components: Algorithms, APIs
Reporter: Deron Eriksson
Assignee: Deron Eriksson
Currently many scripts contain usage comments such as the following:
{code}
# THIS SCRIPT COMPUTES AN APPROXIMATE FACTORIZATIONOF A LOW-RANK MATRIX X INTO TWO MATRICES U AND V
# USING ALTERNATING-LEAST-SQUARES (ALS) ALGORITHM WITH CONJUGATE GRADIENT
# MATRICES U AND V ARE COMPUTED BY MINIMIZING A LOSS FUNCTION (WITH REGULARIZATION)
#
# INPUT PARAMETERS:
# ---------------------------------------------------------------------------------------------
# NAME TYPE DEFAULT MEANING
# ---------------------------------------------------------------------------------------------
# X String --- Location to read the input matrix X to be factorized
# U String --- Location to write the factor matrix U
# V String --- Location to write the factor matrix V
# rank Int 10 Rank of the factorization
# reg String "L2" Regularization:
# "L2" = L2 regularization;
# "wL2" = weighted L2 regularization
# lambda Double 0.000001 Regularization parameter, no regularization if 0.0
# maxi Int 50 Maximum number of iterations
# check Boolean FALSE Check for convergence after every iteration, i.e., updating U and V once
# thr Double 0.0001 Assuming check is set to TRUE, the algorithm stops and convergence is declared
# if the decrease in loss in any two consecutive iterations falls below this threshold;
# if check is FALSE thr is ignored
# fmt String "text" The output format of the factor matrices L and R, such as "text" or "csv"
# ---------------------------------------------------------------------------------------------
# OUTPUT:
# 1- An m x r matrix U, where r is the factorization rank
# 2- An r x n matrix V
#
# HOW TO INVOKE THIS SCRIPT - EXAMPLE:
# hadoop jar SystemML.jar -f ALS-CG.dml -nvargs X=INPUT_DIR/X U=OUTPUT_DIR/U V=OUTPUT_DIR/V rank=10 reg="L2" lambda=0.0001 fmt=csv
{code}
Comments such as these are difficult to refer to from a programmatic interactive environment such as the Spark Shell. If similar information is provided in a parseable format, such as JSON or XML, it can potentially be parsed and used to provide such information programmatically, such as through the MLContext API in the Spark Shell.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)