You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by mb...@apache.org on 2018/08/05 21:53:48 UTC
systemml git commit: [SYSTEMML-2090] Language documentation for paramserv builtin function

Repository: systemml
Updated Branches:
  refs/heads/master 13baec95c -> fb90e3bff


[SYSTEMML-2090] Language documentation for paramserv builtin function

Closes #816.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/fb90e3bf
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/fb90e3bf
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/fb90e3bf

Branch: refs/heads/master
Commit: fb90e3bff41e9c0d58a80867243641faec437c09
Parents: 13baec9
Author: EdgarLGB <gu...@atos.net>
Authored: Sun Aug 5 14:37:59 2018 -0700
Committer: Matthias Boehm <mb...@gmail.com>
Committed: Sun Aug 5 14:53:50 2018 -0700

----------------------------------------------------------------------
 docs/dml-language-reference.md | 77 +++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/fb90e3bf/docs/dml-language-reference.md
----------------------------------------------------------------------
diff --git a/docs/dml-language-reference.md b/docs/dml-language-reference.md
index 5bf9099..8a89d99 100644
--- a/docs/dml-language-reference.md
+++ b/docs/dml-language-reference.md
@@ -53,6 +53,7 @@ limitations under the License.
     * [Read/Write Built-In Functions](dml-language-reference.html#readwrite-built-in-functions)
     * [Data Pre-Processing Built-In Functions](dml-language-reference.html#data-pre-processing-built-in-functions)
     * [Deep Learning Built-In Functions](dml-language-reference.html#deep-learning-built-in-functions)
+    * [Parameter Server Built-In Function](dml-language-reference.html#parameter-server-built-in-function)
     * [Other Built-In Functions](dml-language-reference.html#other-built-in-functions)
   * [Frames](dml-language-reference.html#frames)
     * [Creating Frames](dml-language-reference.html#creating-frames)
@@ -1536,6 +1537,82 @@ Examples:
 | bias_add             |                             | `ones = matrix(1, rows=1, cols=height*width); output = input + matrix(bias %*% ones, rows=1, cols=numChannels*height*width)`                                |
 | bias_multiply        |                             | `ones = matrix(1, rows=1, cols=height*width); output = input * matrix(bias %*% ones, rows=1, cols=numChannels*height*width)`                                |
 
+### Parameter Server Built-in Function
+Apart from data-parallel operations and task-parallel parfor loops, SystemML also supports a **data-parallel Parameter Server** via a built-in function **paramserv**. Currently both local multi-threaded and spark distributed backend are supported to execute the **paramserv** function. So far we only support a single parameter server with N workers as well as synchronous and asynchronous model updates per batch or epoch. For example, in order to train a model in local backend with update strategy BSP, 10 epochs, 64 batchsize, 10 workers, **paramserv** function should look like this:
+
+
+    resultModel=paramserv(model=initModel, features=X, labels=Y, 
+                          upd="fun1", agg="fun2", epochs=10, k=10, hyperparams=hParams)
+
+
+**Table**: Inputs of paramserv function
+
+Parameters | Description | Type | Mandatory | Options
+-------- | ----------- | ---------- | ---------- | -------
+model | All the parameters (e.g., the weight and bias matrices) | list | yes | 
+features | Training features | matrix | yes 
+labels | Training labels | matrix | yes
+val_features | Validation features | matrix | no
+val_labels | Validation labels | matrix | no
+upd | Physical name of gradient calculation function. The format should be "related path:func name". For example, "./mnist_lenet_paramserv_sgd.dml::gradients". | string | yes
+agg | Physical name of gradient aggregation function. The format should be "related path:func name". For example, "./mnist_lenet_paramserv_sgd.dml::aggregation". | string | yes
+mode | Execution backend for data partitioning and worker execution | string | no | "LOCAL"(default), "REMOTE_SPARK"
+utype | Update strategy | string | no | "ASP"(default), "BSP"
+freq | Frequency of model updating | string | no | "EPOCH"(default), "BATCH"
+epochs | Number of epochs, where an epoch is a full scan over the data | integer | yes |
+batchsize | Size of a mini-batch (number of rows) | integer | no | 64(default)
+k | Number of workers | integer | no | Number of vcores(default)
+scheme | Scheme of data partition, i.e., how the data is distributed across workers | string | no | "DISJOINT_CONTIGUOUS"(default), "DISJOINT_ROUND_ROBIN", "DISJOINT_RANDOM", "OVERLAP_RESHUFFLE"
+hyperparams | Additional hyper parameters, e.g., learning rate, momentum | list | yes | 
+checkpointing | Checkpoint strategy, currently not supported | string | no | 
+
+**Table**: Output of paramserv function
+
+Output | Description | Type
+-------- | ----------- | ----------
+model | Trained model | list
+
+**Update function:**
+
+The update function calculates the gradients for a single mini-batch and the given model (e.g., via a forward and backward pass through a neural network). The implementation of this function should be based on a function signature like this: (i.e., **the input parameter including both type and name should be exactly the same as the below, except that the output name could be different**)
+
+```sh
+gradients = function(list[unknown] model, list[unknown] hyperparams,
+                     matrix[double] features, matrix[double] labels)
+          return (list[unknown] gradients)
+          # the output name can be something else than "gradients" but should always return a list
+```
+
+**Aggregate function:**
+
+The aggregate function then takes the computed or accrued gradients and updates the model via some optimizer such as Adagrad or Adam. The implementation of this function should be based on a function signature like this: (i.e., **the input parameter including both type and name should be exactly the same as the below, except that the output name could be different**)
+
+```sh
+aggregation = function(list[unknown] model, list[unknown] hyperparams,
+                       list[unknown] gradients)
+         return (list[unknown] modelResult)
+         # the output name can be something else than "modelResult" but should always return a list
+```
+
+**Update strategy:**
+
+Currently, two types of update strategy, **ASP** and **BSP**, are supported. **ASP**, a.k.a. _Asynchronous Parallel_, means that the model updates will be completed in an asynchronous manner. The parameter server updates the model and broadcasts the updated model immediately with the fresh gradients pushed by the worker and then the worker is able to pull the new updated model. This push-and-pull process is done asynchronously across workers. While **BSP**, a.k.a. _Bulk Synchronous Parallel_, the server will update the global model until having received all the gradients sent by workers in one iteration and then workers could move into the next iteration. Hence, the overall performance is affected by stragglers (i.e., the slowest worker).
+
+**Update frequency:**
+
+When pushing the gradients from workers to server for updating the model, we could determine how often this push-and-pull process will be done. Currently, two types of update frequency, **EPOCH** and **BATCH** are supported. When setting to **EPOCH**, the generated gradients of each mini-batch are accumulated locally in each worker. The accrued gradients are then pushed to the server whenever a worker finished an epoch. While setting to **BATCH**, the generated gradients of each mini-batch are pushed to server immediately to launch the push-and-pull process.
+
+**Data partition schemes:**
+
+Before launching the data-parallel parameter server, the original data will be partitioned across workers according to some schemes. Currently, four types of schemes are supported, Disjoint_Contigous, Disjoint_Round_Robin, Disjoint_Random, Overlap_Reshuffle.
+
+Scheme | Definition
+-------- | -----------
+Disjoint_Contiguous | For each worker, use a right indexing operation X[beg:end,] to obtain contiguous, non-overlapping partitions of rows
+Disjoint_Round_Robin | For each worker, use a permutation multiply or simpler a removeEmpty such as removeEmpty(target=X, margin=rows, select=(seq(1,nrow(X))%%k)==id)
+Disjoint_Random | For each worker, use a permutation multiply P[beg:end,] %*% X, where P is constructed for example with P=table(seq(1,nrow(X),sample(nrow(X), nrow(X)))), i.e., sampling without replacement to ensure disjointness
+Overlap_Reshuffle | Similar to the above, except to create a new permutation matrix for each worker and without the indexing on P
+
 ### Other Built-In Functions
 
 **Table 16**: Other Built-In Functions