You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ja...@apache.org on 2019/03/27 20:02:49 UTC
[systemml] branch master updated: [SYSTEMML-1437][DOC] Document Factorization machines

This is an automated email from the ASF dual-hosted git repository.

janardhan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/systemml.git


The following commit(s) were added to refs/heads/master by this push:
     new ac5036b  [SYSTEMML-1437][DOC] Document Factorization machines
ac5036b is described below

commit ac5036b1338bd078cff1bb02f1ae75ec12e3b98d
Author: Janardhan Pulivarthi <j1...@protonmail.com>
AuthorDate: Thu Mar 28 01:25:34 2019 +0530

    [SYSTEMML-1437][DOC] Document Factorization machines
    
    - This patch documents the technical details of the factorization model
    - Also, a binary classification and a regression script with suitable examples
---
 docs/algorithms-factorization-machines.md | 226 ++++++++++++++++++++++++++++++
 docs/algorithms-reference.md              |   3 +
 2 files changed, 229 insertions(+)

diff --git a/docs/algorithms-factorization-machines.md b/docs/algorithms-factorization-machines.md
new file mode 100644
index 0000000..3a380d3
--- /dev/null
+++ b/docs/algorithms-factorization-machines.md
@@ -0,0 +1,226 @@
+---
+layout: global
+title: SystemML Algorithms Reference - Factorization Machines
+displayTitle: <a href="algorithms-reference.html">SystemML Algorithms Reference</a>
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+# 7. Factorization Machines
+
+### Description
+
+The Factorization Machine (FM), is a general predictor like SVMs but is also
+able to estimate reliable parameters under very high sparsity. The factorization machine
+models all nested variable interactions (compared to a polynomial kernel in SVM), but
+uses a factorized parameterization instead of a dense parameterisation like in SVMs.
+
+## Core Model
+
+### 1. Model Equation:
+
+$$ \hat{y}(x) =
+w_0 +
+\sum_{i=1}^{n} w_i x_i +
+\sum_{i=1}^{n} \sum_{j=i+1}^{n} \left <v_i, v_j \right > x_i x_j
+$$
+
+ where the model parameters that have to be estimated are:
+ $$
+ w_0 \in R,
+ W   \in R^n,
+ V   \in R^{n \times k}
+ $$
+
+and
+ $$
+   \left <\cdot, \cdot \right >
+ $$
+is the dot product of two vectors of size $$k$$:
+ $$
+ \left <v_i, v_j \right > = \sum_{f=1}^{k} v_{i,f} \cdot v_{j,f}
+ $$
+
+A row $$ v_i $$ with in $$ V $$describes the $$i$$th variable with $$k \in N_0^+$$ factors. $$k$$ is a hyperparameter, that
+defines the dimensionality of factorization.
+
+ * $$ w_0 $$ : global bias
+ * $$ w_j $$ : models the strength of the ith variable
+ * $$ w_{i,j} = \left <v_i, v_j \right> $$ : models the interaction between the $$i$$th & $$j$$th variable.
+
+Instead of using an own model parameter $$ w_{i,j} \in R $$ for each interaction, the FM
+models the interaction by factorizing it.
+
+### 2. Expressiveness:
+
+It is well known that for any positive definite matrix $$W$$, there exists a matrix $$V$$ such that
+$$W = V \cdot V^t$$ provided that $$k$$ is sufficiently large. This shows that an FM can express  any
+interaction matrix $$W$$ if $$k$$ is chosen large enough.
+
+### 3. Parameter Estimation Under Sparsity:
+
+In sparse settings, there is usually not enough data to estimate interaction between variables
+directly & independently. FMs can estimate interactions even in these settings well because
+they break the independence of the interaction parameters by factorizing them.
+
+### 4. Computation:
+
+Due to factorization of pairwise interactions, there is not model parameter that directly depends
+on two variables ( e.g., a parameter with an index $$(i,j)$$ ). So, the pairwise interactions can be
+reformulated as shown below.
+
+$$
+\sum_{i=1}^n \sum_{j=i+1}^n \left <v_i, v_j \right > x_i x_j
+$$
+
+$$
+= {1 \over 2} \sum_{i=1}^n \sum_{j=1}^n x_i x_j - {1 \over 2} \sum_{i=1}^n \left <v_i, v_j \right > x_i x_i
+$$
+
+$$
+= {1 \over 2} \left ( \sum_{i=1}^n \sum_{j=1}^n \sum_{f=1}^k v_{i,f} v_{j,f} - \sum_{i=1}^n \sum_{f=1}^k v_{i,f}v_{i,f} x_i x_i \right )
+$$
+
+$$
+= {1 \over 2} \left ( \sum_{f=1}^k \right ) \left ( \left (\sum_{i=1}^n v_{i,f} x_i \right ) \left (\sum_{j=1}^n v_{j,f} x_j \right ) - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right )
+$$
+
+$$
+{1 \over 2} \sum_{f=1}^k \left ( \left ( \sum_{i=1}^n v_{i,f} x_i \right )^2 - \sum_{i=1}^n v_{i,f}^2 x_i^2 \right )
+$$
+
+### 5. Learning Factorization Machines
+The gradient vector taken for each of the weights, is
+$$
+% <![CDATA[
+{ \delta \over \delta \theta } \hat{y}(x) =
+\begin{cases}
+1 & \text{if } \theta \text{ is } w_0 \\
+x_i & \text{if } \theta \text{ is } w_i \\
+x_i \sum_{j=1}^{n} v_{j,f} \cdot x_j - v_{i,f} \cdot x_i^2 & \text{if } \theta \text{ is } \theta_{i,f}
+\end{cases} %]]>
+$$
+
+### 6. Factorization Models as Predictors:
+
+### 6.1. Regression:
+ $$\hat{y}(x)$$ can be used directly as the predictor and the optimization criterion is the minimal
+least square error on $$D$$.
+
+### Usage:
+The `train()` function in the [fm-regression.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-regression.dml) script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training.
+```
+train = function(matrix[double] X, matrix[double] y, matrix[double] X_val, matrix[double] y_val)
+    return (matrix[double] w0, matrix[double] W, matrix[double] V) {
+  /*
+   * Trains the FM model.
+   *
+   * Inputs:
+   *  - X     : n examples with d features, of shape (n, d)
+   *  - y     : Target matrix, of shape (n, 1)
+   *  - X_val : Input validation data matrix, of shape (n, d)
+   *  - y_val : Target validation matrix, of shape (n, 1)
+   *
+   * Outputs:
+   *  - w0, W, V : updated model parameters.
+   *
+   * Network Architecture:
+   *
+   * X --> [model] --> out --> l2_loss::backward(out, y) --> dout
+   *
+   */
+   
+   ...
+   # 7.Call adam::update for all parameters
+   [w0,mw0,vw0] = adam::update(w0, dw0, lr, beta1, beta2, epsilon, t, mw0, vw0);
+   [W, mW, vW]  = adam::update(W, dW, lr, beta1, beta2, epsilon, t, mW, vW );
+   [V, mV, vV]  = adam::update(V, dV, lr, beta1, beta2, epsilon, t, mV, vV );
+
+}
+```
+Once the `train` function returns the  weights for the `fm` model, these are passed to the `predict` function. 
+
+```
+predict = function(matrix[double] X, matrix[double] w0, matrix[double] W, matrix[double] V)
+    return (matrix[double] out) {
+  /*
+   * Computes the predictions for the given inputs.
+   *
+   * Inputs:
+   *  - X : n examples with d features, of shape (n, d).
+   *  - w0, W, V : trained model parameters.
+   *
+   * Outputs:
+   *  - out : target vector, y.
+   */
+
+    out = fm::forward(X, w0, W, V);
+
+}
+```
+
+#### running with dummy data:
+ The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend.
+
+<div class="codetabs">
+<div data-lang="Hadoop" markdown="1">
+    hadoop jar SystemML.jar -f ./scripts/nn/examples/fm-regression-dummy-data.dml
+
+</div>
+<div data-lang="Spark" markdown="1">
+    $SPARK_HOME/bin/spark-submit --master yarn
+                                 --deploy-mode cluster
+                                 --conf spark.driver.maxResultSize=0
+                                 SystemML.jar
+                                 -f ./scripts/nn/examples/fm-regression-dummy-data.dml
+                                 -config SystemML-config.xml
+                                 -exec hybrid_spark
+</div>
+</div>
+
+### 6.2. Binary Classification:
+ The sign of $$\hat{y}(x)$$ is used & the parameters are optimized for the hinge loss or logit loss.
+
+### Usage: 
+ The `train` function in the [fm-binclass.dml](https://github.com/apache/systemml/blob/master/scripts/staging/fm-binclass.dml) script, takes in the input variable matrix and the corresponding target vector with some input kept for validation during training. This script also contain `train()` and `predict()` function as in the case of regression.
+
+### running with dummy data:
+ The [fm-regression-dummy-data.dml](https://github.com/apache/systemml/blob/master/scripts/nn/examples/fm-regression-dummy-data.dml) file can be a nice template, to extend.
+
+<div class="codetabs">
+<div data-lang="Hadoop" markdown="1">
+    hadoop jar SystemML.jar -f ./scripts/nn/examples/fm-binclass-dummy-data.dml
+
+</div>
+<div data-lang="Spark" markdown="1">
+    $SPARK_HOME/bin/spark-submit --master yarn
+                                 --deploy-mode cluster
+                                 --conf spark.driver.maxResultSize=0
+                                 SystemML.jar
+                                 -f ./scripts/nn/examples/fm-binclass-dummy-data.dml
+                                 -config SystemML-config.xml
+                                 -exec hybrid_spark
+</div>
+</div>
+### 6.3. Ranking: 
+The vectors are ordered by score of $$\hat{y}(x)$$ and the optimization is done over pass of an instance
+vectors $$ (x(a), x(b)) \in D $$ with a pairwise classification loss.
+
+Regularization terms like $$L2$$ are usually applied to the optimization objective to prevent overfitting.
+
diff --git a/docs/algorithms-reference.md b/docs/algorithms-reference.md
index 26c2141..9319093 100644
--- a/docs/algorithms-reference.md
+++ b/docs/algorithms-reference.md
@@ -55,5 +55,8 @@ limitations under the License.
   * [Kaplan-Meier Survival Analysis](algorithms-survival-analysis.html#kaplan-meier-survival-analysis)
   * [Cox Proportional Hazard Regression Model](algorithms-survival-analysis.html#cox-proportional-hazard-regression-model)
 
+* [Factorization Machines](algorithms-factorization-machines.html)
+  * [Factorization Machine](algorithms-factorization-machines.html#core-model)
+  
 * [Bibliography](algorithms-bibliography.html)