You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ja...@apache.org on 2020/06/13 09:19:31 UTC
[systemml] branch master updated: [DOC] Documentation for builtin glm function

This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemml.git


The following commit(s) were added to refs/heads/master by this push:
     new 30b6f27  [DOC] Documentation for builtin glm function
30b6f27 is described below

commit 30b6f2746682100fab4cfe785d777604902f8da2
Author: Supratick Dey <su...@gmail.com>
AuthorDate: Sat Jun 13 14:43:56 2020 +0530

    [DOC] Documentation for builtin glm function
    
    Closes #968.
---
 dev/docs/builtins-reference.md | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/dev/docs/builtins-reference.md b/dev/docs/builtins-reference.md
index e72251d..a92fa75 100644
--- a/dev/docs/builtins-reference.md
+++ b/dev/docs/builtins-reference.md
@@ -24,6 +24,7 @@ limitations under the License.
     * [`tensor`-Function](#tensor-function)
   * [DML-Bodied Built-In functions](#dml-bodied-built-in-functions)
     * [`confusionMatrix`-Function](#confusionmatrix-function)
+    * [`glm`-Function](#glm-function)
     * [`gridSearch`-Function](#gridSearch-function)
     * [`KMeans`-Function](#KMeans-function)
     * [`lm`-Function](#lm-function)
@@ -158,6 +159,45 @@ y = toOneHot(X, numClasses)
 [ConfusionSum, ConfusionAvg] = confusionMatrix(P=z, Y=y)
 ```
 
+## `glm`-Function
+
+The `glm`-function  is a flexible generalization of ordinary linear regression that allows for response variables that have
+error distribution models.
+
+### Usage
+```r
+glm(X,Y)
+```
+
+### Arguments
+| Name | Type           | Default  | Description |
+| :--- | :------------- | :------- | :---------- |
+| X    | Matrix[Double] | required | matrix X of feature vectors |
+| Y    | Matrix[Double] | required | matrix Y with either 1 or 2 columns: if dfam = 2, Y is 1-column Bernoulli or 2-column Binomial (#pos, #neg) |
+| dfam | Int            | `1`      | Distribution family code: 1 = Power, 2 = Binomial |
+| vpow | Double         | `0.0`    | Power for Variance defined as (mean)^power (ignored if dfam != 1):  0.0 = Gaussian, 1.0 = Poisson, 2.0 = Gamma, 3.0 = Inverse Gaussian |
+| link | Int            | `0`      | Link function code: 0 = canonical (depends on distribution), 1 = Power, 2 = Logit, 3 = Probit, 4 = Cloglog, 5 = Cauchit |
+| lpow | Double         | `1.0`    | Power for Link function defined as (mean)^power (ignored if link != 1):  -2.0 = 1/mu^2, -1.0 = reciprocal, 0.0 = log, 0.5 = sqrt, 1.0 = identity |
+| yneg | Double         | `0.0`    | Response value for Bernoulli "No" label, usually 0.0 or -1.0 |
+| icpt | Int            | `0`      | Intercept presence, X columns shifting and rescaling: 0 = no intercept, no shifting, no rescaling; 1 = add intercept, but neither shift nor rescale X; 2 = add intercept, shift & rescale X columns to mean = 0, variance = 1 |
+| reg  | Double         | `0.0`    | Regularization parameter (lambda) for L2 regularization |
+| tol  | Double         | `1e-6`   | Tolerance (epislon) value. |
+| disp | Double         | `0.0`    | (Over-)dispersion value, or 0.0 to estimate it from data |
+| moi  | Int            | `200`    | Maximum number of outer (Newton / Fisher Scoring) iterations |
+| mii  | Int            | `0`      | Maximum number of inner (Conjugate Gradient) iterations, 0 = no maximum |
+
+### Returns
+| Type           | Description      |
+| :------------- | :--------------- |
+| Matrix[Double] | Matrix whose size depends on icpt ( icpt=0: ncol(X) x 1;  icpt=1: (ncol(X) + 1) x 1;  icpt=2: (ncol(X) + 1) x 2) |
+
+### Example
+```r
+X = rand (rows = 5, cols = 5 )
+y = X %*% rand(rows = ncol(X), cols = 1)
+beta = glm(X=X,Y=y)
+```
+
 ## `gridSearch`-Function
 
 The `gridSearch`-function is used to find the optimal hyper-parameters of a model which results in the most _accurate_