You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2016/04/12 08:22:21 UTC
svn commit: r1738695 [8/10] - in /incubator/singa/site/trunk/content: ./ markdown/docs/ markdown/docs/zh/ markdown/releases/ markdown/v0.2.0/ markdown/v0.2.0/jp/ markdown/v0.2.0/kr/ markdown/v0.2.0/zh/

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,215 @@
+# MLP Example
+
+---
+
+Multilayer perceptron (MLP) is a subclass of feed-forward neural networks.
+A MLP typically consists of multiple directly connected layers, with each layer fully
+connected to the next one. In this example, we will use SINGA to train a
+[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
+for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.
+
+    # in examples/mnist
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+### Training on CPU
+
+After the datasets are prepared, we start the training by
+
+    ./bin/singa-run.sh -conf examples/mnist/job.conf
+
+After it is started, you should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+    Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+    E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073)
+    E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+    E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100
+    E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000
+    E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800
+    E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200
+    E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100
+    E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800
+    E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100
+    E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100
+    E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600
+    E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000
+    E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500
+    E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500
+    E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000
+    E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500
+    E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900
+
+After the training of some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+### Training on GPU
+
+To train this example model on GPU, just add a field in the configuration file for
+the GPU device,
+
+    # job.conf
+    gpu: 0
+
+### Training using Python script
+
+The python helpers come with SINGA 0.2 make it easy to configure the job. For example
+the job.conf is replaced with a simple python script mnist_mlp.py
+which has about 30 lines of code following the [Keras API](http://keras.io/).
+
+    ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
+
+
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to pre-process the dataset you
+use to a format that SINGA can read. Please refer to the
+[Data Preparation](data.html) to get details about preparing
+this MNIST dataset.
+
+
+### Neural net
+
+<div style = "text-align: center">
+<img src = "../images/example-mlp.png" style = "width: 230px">
+<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
+</div>
+
+
+Figure 1 shows the structure of the simple MLP model, which is constructed following
+[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
+two layers which represent one feature transformation stage. There are 6 such
+stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from
+2500->2000->1500->1000->500->10.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+* We configure an input layer to read the training/testing records from a disk file.
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/train_data.bin"
+              random_skip: 5000
+              batchsize: 64
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTest
+          }
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/test_data.bin"
+              batchsize: 100
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTrain
+          }
+
+
+* All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as,
+
+        layer{
+          name: "fc1"
+          type: kInnerProduct
+          srclayers:"data"
+          innerproduct_conf{
+            num_output: 2500
+          }
+          param{
+            name: "w1"
+            ...
+          }
+          param{
+            name: "b1"
+            ..
+          }
+        }
+
+    with the `num_output` decreasing from 2500 to 10.
+
+* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer
+except the last one. It transforms the feature via scaled tanh function.
+
+        layer{
+          name: "tanh1"
+          type: kSTanh
+          srclayers:"fc1"
+        }
+
+* The final [Softmax loss layer](layer.html#softmaxloss) connects
+to LabelLayer and the last STanhLayer.
+
+        layer{
+          name: "loss"
+          type:kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"fc6"
+          srclayers:"data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).
+
+    updater{
+      type: kSGD
+      learning_rate{
+        base_lr: 0.001
+        type : kStep
+        step_conf{
+          change_freq: 60
+          gamma: 0.997
+        }
+      }
+    }
+
+### TrainOneBatch algorithm
+
+The MLP model is a feed-forward model, hence
+[Back-propagation algorithm](train-one-batch#back-propagation)
+is selected.
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,294 @@
+# Model Configuration
+
+---
+
+SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters
+of deep learning models.  For each SGD iteration, there is a
+[Worker](architecture.html) computing
+gradients of parameters from the NeuralNet and a [Updater]() updating parameter
+values based on gradients. Hence the model configuration mainly consists these
+three parts. We will introduce the NeuralNet, Worker and Updater in the
+following paragraphs and describe the configurations for them. All model
+configuration is specified in the model.conf file in the user provided
+workspace folder. E.g., the [cifar10 example folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10)
+has a model.conf file.
+
+
+## NeuralNet
+
+### Uniform model (neuralnet) representation
+
+<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1:
+Deep learning model categorization</img>
+
+Many deep learning models have being proposed. Fig. 1 is a categorization of
+popular deep learning models based on the layer connections. The
+[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h)
+abstraction of SINGA consists of multiple directly connected layers. This
+abstraction is able to represent models from all the three categorizations.
+
+  * For the feed-forward models, their connections are already directed.
+
+  * For the RNN models, we unroll them into directed connections, as shown in
+  Fig. 2.
+
+  * For the undirected connections in RBM, DBM, etc., we replace each undirected
+  connection with two directed connection, as shown in Fig. 3.
+
+<div style = "height: 200px">
+<div style = "float:left; text-align: center">
+<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img>
+</div>
+<div style = "float:left; text-align: center; margin-left: 40px">
+<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img>
+</div>
+</div>
+
+In specific, the NeuralNet class is defined in
+[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) :
+
+    ...
+    vector<Layer*> layers_;
+    ...
+
+The Layer class is defined in
+[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h):
+
+    vector<Layer*> srclayers_, dstlayers_;
+    LayerProto layer_proto_;  // layer configuration, including meta info, e.g., name
+    ...
+
+
+The connection with other layers are kept in the `srclayers_` and `dstlayers_`.
+Since there are many different feature transformations, there are many
+different Layer implementations correspondingly. For layers that have
+parameters in their feature transformation functions, they would have Param
+instances in the layer class, e.g.,
+
+    Param weight;
+
+
+### Configure the structure of a NeuralNet instance
+
+To train a deep learning model, the first step is to write the configurations
+for the model structure, i.e., the layers and connections for the NeuralNet.
+Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol
+Buffer](https://developers.google.com/protocol-buffers/) to define the
+configuration protocol. The
+[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto)
+specifies the configuration fields for a NeuralNet instance,
+
+message NetProto {
+  repeated LayerProto layer = 1;
+  ...
+}
+
+The configuration is then
+
+    layer {
+      // layer configuration
+    }
+    layer {
+      // layer configuration
+    }
+    ...
+
+To configure the model structure, we just configure each layer involved in the model.
+
+    message LayerProto {
+      // the layer name used for identification
+      required string name = 1;
+      // source layer names
+      repeated string srclayers = 3;
+      // parameters, e.g., weight matrix or bias vector
+      repeated ParamProto param = 12;
+      // the layer type from the enum above
+      required LayerType type = 20;
+      // configuration for convolution layer
+      optional ConvolutionProto convolution_conf = 30;
+      // configuration for concatenation layer
+      optional ConcateProto concate_conf = 31;
+      // configuration for dropout layer
+      optional DropoutProto dropout_conf = 33;
+      ...
+    }
+
+A sample configuration for a feed-forward model is like
+
+    layer {
+      name : "input"
+      type : kRecordInput
+    }
+    layer {
+      name : "conv"
+      type : kInnerProduct
+      srclayers : "input"
+      param {
+        // configuration for parameter
+      }
+      innerproduct_conf {
+        // configuration for this specific layer
+      }
+      ...
+    }
+
+The layer type list is defined in
+[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto).
+One type (kFoo) corresponds to one child class of Layer (FooLayer) and one
+configuration field (foo_conf). All built-in layers are introduced in the [layer page](layer.html).
+
+## Worker
+
+At the beginning, the Work will initialize the values of Param instances of
+each layer either randomly (according to user configured distribution) or
+loading from a [checkpoint file]().  For each training iteration, the worker
+visits layers of the neural network to compute gradients of Param instances of
+each layer. Corresponding to the three categories of models, there are three
+different algorithm to compute the gradients of a neural network.
+
+  1. Back-propagation (BP) for feed-forward models
+  2. Back-propagation through time (BPTT) for recurrent neural networks
+  3. Contrastive divergence (CD) for RBM, DBM, etc models.
+
+SINGA has provided these three algorithms as three Worker implementations.
+Users only need to configure in the model.conf file to specify which algorithm
+should be used. The configuration protocol is
+
+    message ModelProto {
+      ...
+      enum GradCalcAlg {
+      // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN
+      kBP = 1;
+      // BPTT for recurrent neural networks
+      kBPTT = 2;
+      // CD algorithm for RBM, DBM etc., models
+      kCd = 3;
+      }
+      // gradient calculation algorithm
+      required GradCalcAlg alg = 8 [default = kBackPropagation];
+      ...
+    }
+
+These algorithms override the TrainOneBatch function of the Worker. E.g., the
+BPWorker implements it as
+
+    void BPWorker::TrainOneBatch(int step, Metric* perf) {
+      Forward(step, kTrain, train_net_, perf);
+      Backward(step, train_net_);
+    }
+
+The Forward function passes the raw input features of one mini-batch through
+all layers, and the Backward function visits the layers in reverse order to
+compute the gradients of the loss w.r.t each layer's feature and each layer's
+Param objects. Different algorithms would visit the layers in different orders.
+Some may traverses the neural network multiple times, e.g., the CDWorker's
+TrainOneBatch function is:
+
+    void CDWorker::TrainOneBatch(int step, Metric* perf) {
+      PostivePhase(step, kTrain, train_net_, perf);
+      NegativePhase(step, kTran, train_net_, perf);
+      GradientPhase(step, train_net_);
+    }
+
+Each `*Phase` function would visit all layers one or multiple times.
+All algorithms will finally call two functions of the Layer class:
+
+     /**
+      * Transform features from connected layers into features of this layer.
+      *
+      * @param phase kTrain, kTest, kPositive, etc.
+      */
+     virtual void ComputeFeature(Phase phase, Metric* perf) = 0;
+     /**
+      * Compute gradients for parameters (and connected layers).
+      *
+      * @param phase kTrain, kTest, kPositive, etc.
+      */
+     virtual void ComputeGradient(Phase phase) = 0;
+
+All [Layer implementations]() must implement the above two functions.
+
+
+## Updater
+
+Once the gradients of parameters are computed, the Updater will update
+parameter values.  There are many SGD variants for updating parameters, like
+[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf),
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf),
+[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf),
+[Nesterov](http://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=DJ8Ep8YAAAAJ&amp;citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C)
+and SGD with momentum. The core functions of the Updater is
+
+    /**
+     * Update parameter values based on gradients
+     * @param step training step
+     * @param param pointer to the Param object
+     * @param grad_scale scaling factor for the gradients
+     */
+    void Update(int step, Param* param, float grad_scale=1.0f);
+    /**
+     * @param step training step
+     * @return the learning rate for this step
+     */
+    float GetLearningRate(int step);
+
+SINGA provides several built-in updaters and learning rate change methods.
+Users can configure them according to the UpdaterProto
+
+    message UpdaterProto {
+      enum UpdaterType{
+        // noraml SGD with momentum and weight decay
+        kSGD = 1;
+        // adaptive subgradient, http://www.magicbroom.info/Papers/DuchiHaSi10.pdf
+        kAdaGrad = 2;
+        // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
+        kRMSProp = 3;
+        // Nesterov first optimal gradient method
+        kNesterov = 4;
+      }
+      // updater type
+      required UpdaterType type = 1 [default=kSGD];
+      // configuration for RMSProp algorithm
+      optional RMSPropProto rmsprop_conf = 50;
+
+      enum ChangeMethod {
+        kFixed = 0;
+        kInverseT = 1;
+        kInverse = 2;
+        kExponential = 3;
+        kLinear = 4;
+        kStep = 5;
+        kFixedStep = 6;
+      }
+      // change method for learning rate
+      required ChangeMethod lr_change= 2 [default = kFixed];
+
+      optional FixedStepProto fixedstep_conf=40;
+      ...
+      optional float momentum = 31 [default = 0];
+      optional float weight_decay = 32 [default = 0];
+      // base learning rate
+      optional float base_lr = 34 [default = 0];
+    }
+
+
+## Other model configuration fields
+
+Some other important configuration fields for training a deep learning model is
+listed:
+
+    // model name, e.g., "cifar10-dcnn", "mnist-mlp"
+    string name;
+    // displaying training info for every this number of iterations, default is 0
+    int32 display_freq;
+    // total num of steps/iterations for training
+    int32 train_steps;
+    // do test for every this number of training iterations, default is 0
+    int32 test_freq;
+    // run test for this number of steps/iterations, default is 0.
+    // The test dataset has test_steps * batchsize instances.
+    int32 test_steps;
+    // do checkpoint for every this number of training steps, default is 0
+    int32 checkpoint_freq;
+
+The pages of [checkpoint and restore](checkpoint.html) has details on checkpoint related fields.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,327 @@
+# Neural Net
+
+---
+
+`NeuralNet` in SINGA represents an instance of user's neural net model. As the
+neural net typically consists of a set of layers, `NeuralNet` comprises
+a set of unidirectionally connected [Layer](layer.html)s.
+This page describes how to convert an user's neural net into
+the configuration of `NeuralNet`.
+
+<img src="../images/model-category.png" align="center" width="200px"/>
+<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span>
+
+## Net structure configuration
+
+Users configure the `NeuralNet` by listing all layers of the neural net and
+specifying each layer's source layer names. Popular deep learning models can be
+categorized as Figure 1. The subsequent sections give details for each
+category.
+
+### Feed-forward models
+
+<div align = "left">
+<img src="../images/mlp-net.png" align="center" width="200px"/>
+<span><strong>Figure 2 - Net structure of a MLP model.</strong></span>
+</div>
+
+Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer
+connections are undirected without circles. The
+configuration for the MLP model shown in Figure 1 is as follows,
+
+    net {
+      layer {
+        name : 'data"
+        type : kData
+      }
+      layer {
+        name : 'image"
+        type : kImage
+        srclayer: 'data'
+      }
+      layer {
+        name : 'label"
+        type : kLabel
+        srclayer: 'data'
+      }
+      layer {
+        name : 'hidden"
+        type : kHidden
+        srclayer: 'image'
+      }
+      layer {
+        name : 'softmax"
+        type : kSoftmaxLoss
+        srclayer: 'hidden'
+        srclayer: 'label'
+      }
+    }
+
+### Energy models
+
+<img src="../images/rbm-rnn.png" align="center" width="500px"/>
+<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span>
+
+
+For energy models including RBM, DBM,
+etc., their connections are undirected (i.e., Category B). To represent these models using
+`NeuralNet`, users can simply replace each connection with two directed
+connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source
+layer field should include each other's name.
+The full [RBM example](rbm.html) has
+detailed neural net configuration for a RBM model, which looks like
+
+    net {
+      layer {
+        name : "vis"
+        type : kVisLayer
+        param {
+          name : "w1"
+        }
+        srclayer: "hid"
+      }
+      layer {
+        name : "hid"
+        type : kHidLayer
+        param {
+          name : "w2"
+          share_from: "w1"
+        }
+        srclayer: "vis"
+      }
+    }
+
+### RNN models
+
+For recurrent neural networks (RNN), users can remove the recurrent connections
+by unrolling the recurrent layer.  For example, in Figure 3b, the original
+layer is unrolled into a new layer with 4 internal layers. In this way, the
+model is like a normal feed-forward model, thus can be configured similarly.
+The [RNN example](rnn.html) has a full neural net
+configuration for a RNN model.
+
+
+## Configuration for multiple nets
+
+Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc..  To avoid
+redundant configurations for the shared layers, users can uses the `exclude`
+filed to filter a layer in the neural net, e.g., the following layer will be
+filtered when creating the testing `NeuralNet`.
+
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+
+
+## Neural net partitioning
+
+A neural net can be partitioned in different ways to distribute the training
+over multiple workers.
+
+### Batch and feature dimension
+
+<img src="../images/partition_fc.png" align="center" width="400px"/>
+<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span>
+
+
+Every layer's feature blob is considered a matrix whose rows are feature
+vectors. Thus, one layer can be split on two dimensions. Partitioning on
+dimension 0 (also called batch dimension) slices the feature matrix by rows.
+For instance, if the mini-batch size is 256 and the layer is partitioned into 2
+sub-layers, each sub-layer would have 128 feature vectors in its feature blob.
+Partitioning on this dimension has no effect on the parameters, as every
+[Param](param.html) object is replicated in the sub-layers. Partitioning on dimension
+1 (also called feature dimension) slices the feature matrix by columns. For
+example, suppose the original feature vector has 50 units, after partitioning
+into 2 sub-layers, each sub-layer would have 25 units. This partitioning may
+result in [Param](param.html) object being split, as shown in
+Figure 4. Both the bias vector and weight matrix are
+partitioned into two sub-layers.
+
+
+### Partitioning configuration
+
+There are 4 partitioning schemes, whose configurations are give below,
+
+  1. Partitioning each singe layer into sub-layers on batch dimension (see
+  below). It is enabled by configuring the partition dimension of the layer to
+  0, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 0
+          }
+
+  2. Partitioning each singe layer into sub-layers on feature dimension (see
+  below).  It is enabled by configuring the partition dimension of the layer to
+  1, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 1
+          }
+
+  3. Partitioning all layers into different subsets. It is enabled by
+  configuring the location ID of a layer, e.g.,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+
+
+  4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is
+  useful for large models. An example application is to implement the
+  [idea proposed by Alex](http://arxiv.org/abs/1404.5997).
+  Hybrid partitioning is configured like,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+          layer {
+            partition_dim: 0
+            location: 0
+          }
+          layer {
+            partition_dim: 1
+            location: 0
+          }
+
+Currently SINGA supports strategy-2 well. Other partitioning strategies are
+are under test and will be released in later version.
+
+## Parameter sharing
+
+Parameters can be shared in two cases,
+
+  * sharing parameters among layers via user configuration. For example, the
+  visible layer and hidden layer of a RBM shares the weight matrix, which is configured through
+  the `share_from` field as shown in the above RBM configuration. The
+  configurations must be the same (except name) for shared parameters.
+
+  * due to neural net partitioning, some `Param` objects are replicated into
+  different workers, e.g., partitioning one layer on batch dimension. These
+  workers share parameter values. SINGA controls this kind of parameter
+  sharing automatically, users do not need to do any configuration.
+
+  * the `NeuralNet` for training and testing (and validation) share most layers
+  , thus share `Param` values.
+
+If the shared `Param` instances resident in the same process (may in different
+threads), they use the same chunk of memory space for their values. But they
+would have different memory spaces for their gradients. In fact, their
+gradients will be averaged by the stub or server.
+
+## Advanced user guide
+
+### Creation
+
+    static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num);
+
+The above function creates a `NeuralNet` for a given phase, and returns a
+pointer to the `NeuralNet` instance. The phase is in {kTrain,
+kValidation, kTest}. `num` is used for net partitioning which indicates the
+number of partitions.  Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc.. The `Create`
+function takes in the full net configuration including layers for training,
+validation and test.  It removes layers for phases other than the specified
+phase based on the `exclude` field in
+[layer configuration](layer.html):
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+The filtered net configuration is passed to the constructor of `NeuralNet`:
+
+    NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+
+The constructor creates a graph representing the net structure firstly in
+
+    Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+
+Next, it creates a layer for each node and connects layers if their nodes are
+connected.
+
+    void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+
+Since the `NeuralNet` instance may be shared among multiple workers, the
+`Create` function returns a pointer to the `NeuralNet` instance .
+
+### Parameter sharing
+
+ `Param` sharing
+is enabled by first sharing the Param configuration (in `NeuralNet::Create`)
+to create two similar (e.g., the same shape) Param objects, and then calling
+(in `NeuralNet::CreateNetFromGraph`),
+
+    void Param::ShareFrom(const Param& from);
+
+It is also possible to share `Param`s of two nets, e.g., sharing parameters of
+the training net and the test net,
+
+    void NeuralNet:ShareParamsFrom(NeuralNet* other);
+
+It will call `Param::ShareFrom` for each Param object.
+
+### Access functions
+`NeuralNet` provides a couple of access function to get the layers and params
+of the net:
+
+    const std::vector<Layer*>& layers() const;
+    const std::vector<Param*>& params() const ;
+    Layer* name2layer(string name) const;
+    Param* paramid2param(int id) const;
+
+
+### Partitioning
+
+
+#### Implementation
+
+SINGA partitions the neural net in `CreateGraph` function, which creates one
+node for each (partitioned) layer. For example, if one layer's partition
+dimension is 0 or 1, then it creates `npartition` nodes for it; if the
+partition dimension is -1, a single node is created, i.e., no partitioning.
+Each node is assigned a partition (or location) ID. If the original layer is
+configured with a location ID, then the ID is assigned to each newly created node.
+These nodes are connected according to the connections of the original layers.
+Some connection layers will be added automatically.
+For instance, if two connected sub-layers are located at two
+different workers, then a pair of bridge layers is inserted to transfer the
+feature (and gradient) blob between them. When two layers are partitioned on
+different dimensions, a concatenation layer which concatenates feature rows (or
+columns) and a slice layer which slices feature rows (or columns) would be
+inserted. These connection layers help making the network communication and
+synchronization transparent to the users.
+
+#### Dispatching partitions to workers
+
+Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one
+worker. Particularly, the pointer to the `NeuralNet` instance is passed
+to every worker within the same group, but each worker only computes over the
+layers that have the same partition (or location) ID as the worker's ID.  When
+every worker computes the gradients of the entire model parameters
+(strategy-2), we refer to this process as data parallelism.  When different
+workers compute the gradients of different parameters (strategy-3 or
+strategy-1), we call this process model parallelism.  The hybrid partitioning
+leads to hybrid parallelism where some workers compute the gradients of the
+same subset of model parameters while other workers compute on different model
+parameters.  For example, to implement the hybrid parallelism in for the
+[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for
+lower layers and `partition_dim = 1` for higher layers.
+

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,54 @@
+# Neural Net Partition
+
+---
+
+The purposes of partitioning neural network is to distribute the partitions onto
+different working units (e.g., threads or nodes, called workers in this article)
+and parallelize the processing.
+Another reason for partition is to handle large neural network which cannot be
+hold in a single node. For instance, to train models against images with high
+resolution we need large neural networks (in terms of training parameters).
+
+Since *Layer* is the first class citizen in SIGNA, we do the partition against
+layers. Specifically, we support partitions at two levels. First, users can configure
+the location (i.e., worker ID) of each layer. In this way, users assign one worker
+for each layer. Secondly, for one layer, we can partition its neurons or partition
+the instances (e.g, images). They are called layer partition and data partition
+respectively. We illustrate the two types of partitions using an simple convolutional neural network.
+
+<img src="../images/conv-mnist.png" style="width: 220px"/>
+
+The above figure shows a convolutional neural network without any partition. It
+has 8 layers in total (one rectangular represents one layer). The first layer is
+DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer
+is a MnistLayer which parses the records from MNIST data to get the pixels of a batch
+of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label
+of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the
+shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1)
+sub-samples the images. The fc1 layer is fully connected with pool1 layer. It
+mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which
+is then normalized by a SoftmaxLossLayer to get the prediction.
+
+<img src="../images/conv-mnist-datap.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all layers
+except the DataLayer and ParserLayers, into 3 partitions using data partition.
+The read layers process 4 images of the batch, the black and blue layers process 2 images
+respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer,
+BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm.
+Layers of the same color resident in the same worker. There would be data transferring
+across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer),
+e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1.
+
+<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all layers
+except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can
+see that each layer processes all 8 images from the batch. But different partitions process
+different part of one image. For instance, the layer conv1-00 process only 4 channels. The other
+4 channels are processed by conv1-01 which residents in another worker.
+
+
+Since the partition is done at the layer level, we can apply different partitions for
+different layers to get a hybrid partition for the whole neural network. Moreover,
+we can also specify the layer locations to locate different layers to different workers.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,93 @@
+# Introduction
+
+---
+
+SINGA is a general distributed deep learning platform for training big deep
+learning models over large datasets. It is designed with an intuitive
+programming model based on the layer abstraction. A variety
+of popular deep learning models are supported, namely feed-forward models including
+convolutional neural networks (CNN), energy models like restricted Boltzmann
+machine (RBM), and recurrent neural networks (RNN). Many built-in layers are
+provided for users. SINGA architecture is
+sufficiently flexible to run synchronous, asynchronous and hybrid training
+frameworks.  SINGA
+also supports different neural net partitioning schemes to parallelize the
+training of large models, namely partitioning on batch dimension, feature
+dimension or hybrid partitioning.
+
+
+## Goals
+
+As a distributed system, the first goal of SINGA is to have good scalability. In other
+words, SINGA is expected to reduce the total training time to achieve certain
+accuracy with more computing resources (i.e., machines).
+
+
+The second goal is to make SINGA easy to use.
+It is non-trivial for programmers to develop and train models with deep and
+complex model structures.  Distributed training further increases the burden of
+programmers, e.g., data and model partitioning, and network communication.  Hence it is essential to
+provide an easy to use programming model so that users can implement their deep
+learning models/algorithms without much awareness of the underlying distributed
+platform.
+
+## Principles
+
+Scalability is a challenging research problem for distributed deep learning
+training. SINGA provides a general architecture to exploit the scalability of
+different training frameworks. Synchronous training frameworks improve the
+efficiency of one training iteration, and
+asynchronous training frameworks improve the convergence rate. Given a fixed budget
+(e.g., cluster size), users can run a hybrid framework that maximizes the
+scalability by trading off between efficiency and convergence rate.
+
+SINGA comes with a programming model designed based on the layer abstraction, which
+is intuitive for deep learning models.  A variety of
+popular deep learning models can be expressed and trained using this programming model.
+
+## System overview
+
+<img src="../images/sgd.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SGD flow.</strong></span>
+
+Training a deep learning model is to find the optimal parameters involved in
+the transformation functions that generate good features for specific tasks.
+The goodness of a set of parameters is measured by a loss function, e.g.,
+[Cross-Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy). Since the
+loss functions are usually non-linear and non-convex, it is difficult to get a
+closed form solution. Typically, people use the stochastic gradient descent
+(SGD) algorithm, which randomly
+initializes the parameters and then iteratively updates them to reduce the loss
+as shown in Figure 1.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 2 - SINGA overview.</strong></span>
+
+SGD is used in SINGA to train
+parameters of deep learning models. The training workload is distributed over
+worker and server units as shown in Figure 2. In each
+iteration, every worker calls *TrainOneBatch* function to compute
+parameter gradients. *TrainOneBatch* takes a *NeuralNet* object
+representing the neural net, and visits layers of the *NeuralNet* in
+certain order. The resultant gradients are sent to the local stub that
+aggregates the requests and forwards them to corresponding servers for
+updating. Servers reply to workers with the updated parameters for the next
+iteration.
+
+
+## Job submission
+
+To submit a job in SINGA (i.e., training a deep learning model),
+users pass the job configuration to SINGA driver in the
+[main function](programming-guide.html). The job configuration
+specifies the four major components in Figure 2,
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+This process is like the job submission in Hadoop, where users configure their
+jobs in the main function to set the mapper, reducer, etc.
+In Hadoop, users can configure their jobs with their own (or built-in) mapper and reducer; in SINGA, users
+can configure their jobs with their own (or built-in) layer, updater, etc.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/param.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/param.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/param.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/param.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,226 @@
+# Parameters
+
+---
+
+A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
+or a bias vector. *Basic user guide* describes how to configure for a `Param`
+object, and *Advanced user guide* provides details on implementing users'
+parameter initialization methods.
+
+## Basic user guide
+
+The configuration of a Param object is inside a layer configuration, as the
+`Param` are associated with layers. An example configuration is like
+
+    layer {
+      ...
+      param {
+        name : "p1"
+        init {
+          type : kConstant
+          value: 1
+        }
+      }
+    }
+
+The [SGD algorithm](overview.html) starts with initializing all
+parameters according to user specified initialization method (the `init` field).
+For the above example,
+all parameters in `Param` "p1" will be initialized to constant value 1. The
+configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html):
+
+  * name, an identifier string. It is an optional field. If not provided, SINGA
+  will generate one based on layer name and its order in the layer.
+  * init, field for setting initialization methods.
+  * share_from, name of another `Param` object, from which this `Param` will share
+  configurations and values.
+  * lr_scale, float value to be multiplied with the learning rate when
+  [updating the parameters](updater.html)
+  * wd_scale, float value to be multiplied with the weight decay when
+  [updating the parameters](updater.html)
+
+There are some other fields that are specific to initialization methods.
+
+### Initialization methods
+
+Users can set the `type` of `init` use the following built-in initialization
+methods,
+
+  * `kConst`, set all parameters of the Param object to a constant value
+
+        type: kConst
+        value: float  # default is 1
+
+  * `kGaussian`, initialize the parameters following a Gaussian distribution.
+
+        type: kGaussian
+        mean: float # mean of the Gaussian distribution, default is 0
+        std: float # standard variance, default is 1
+        value: float # default 0
+
+  * `kUniform`, initialize the parameters following an uniform distribution
+
+        type: kUniform
+        low: float # lower boundary, default is -1
+        high: float # upper boundary, default is 1
+        value: float # default 0
+
+  * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
+  matrix) using `kGaussian` and then
+  multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
+  columns of the matrix.
+
+  * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
+  distribution is an uniform distribution.
+
+  * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
+  multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
+  fan_out` sums up the number of columns and rows of the matrix.
+
+For all above initialization methods except `kConst`, if their `value` is not
+1, every parameter will be multiplied with `value`. Users can also implement
+their own initialization method following the *Advanced user guide*.
+
+
+## Advanced user guide
+
+This sections describes the details on implementing new parameter
+initialization methods.
+
+### Base ParamGenerator
+All initialization methods are implemented as
+subclasses of the base `ParamGenerator` class.
+
+    class ParamGenerator {
+     public:
+      virtual void Init(const ParamGenProto&);
+      void Fill(Param*);
+
+     protected:
+      ParamGenProto proto_;
+    };
+
+Configurations of the initialization method is in `ParamGenProto`. The `Fill`
+function fills the `Param` object (passed in as an argument).
+
+### New ParamGenerator subclass
+
+Similar to implement a new Layer subclass, users can define a configuration
+protocol message,
+
+    # in user.proto
+    message FooParamProto {
+      optional int32 x = 1;
+    }
+    extend ParamGenProto {
+      optional FooParamProto fooparam_conf =101;
+    }
+
+The configuration of `Param` would be
+
+    param {
+      ...
+      init {
+        user_type: 'FooParam" # must use user_type for user defined methods
+        [fooparam_conf] { # must use brackets for configuring user defined messages
+          x: 10
+        }
+      }
+    }
+
+The subclass could be declared as,
+
+    class FooParamGen : public ParamGenerator {
+     public:
+      void Fill(Param*) override;
+    };
+
+Users can access the configuration fields in `Fill` by
+
+    int x = proto_.GetExtension(fooparam_conf).x();
+
+To use the new initialization method, users need to register it in the
+[main function](programming-guide.html).
+
+    driver.RegisterParamGenerator<FooParamGen>("FooParam")  # must be consistent with the user_type in configuration
+
+{% comment %}
+### Base Param class
+
+### Members
+
+    int local_version_;
+    int slice_start_;
+    vector<int> slice_offset_, slice_size_;
+
+    shared_ptr<Blob<float>> data_;
+    Blob<float> grad_;
+    ParamProto proto_;
+
+Each Param object has a local version and a global version (inside the data
+Blob). These two versions are used for synchronization. If multiple Param
+objects share the same values, they would have the same `data_` field.
+Consequently, their global version is the same. The global version is updated
+by [the stub thread](communication.html). The local version is
+updated in `Worker::Update` function which assigns the global version to the
+local version. The `Worker::Collect` function is blocked until the global
+version is larger than the local version, i.e., when `data_` is updated. In
+this way, we synchronize workers sharing parameters.
+
+In Deep learning models, some Param objects are 100 times larger than others.
+To ensure the load-balance among servers, SINGA slices large Param objects. The
+slicing information is recorded by `slice_*`. Each slice is assigned a unique
+ID starting from 0. `slice_start_` is the ID of the first slice of this Param
+object. `slice_offset_[i]` is the offset of the i-th slice in this Param
+object. `slice_size_[i]` is the size of the i-th slice. These slice information
+is used to create messages for transferring parameter values or gradients to
+different servers.
+
+Each Param object has a `grad_` field for gradients. Param objects do not share
+this Blob although they may share `data_`.  Because each layer containing a
+Param object would contribute gradients. E.g., in RNN, the recurrent layers
+share parameters values, and the gradients used for updating are averaged from all recurrent
+these recurrent layers. In SINGA, the stub thread will aggregate local
+gradients for the same Param object. The server will do a global aggregation
+of gradients for the same Param object.
+
+The `proto_` field has some meta information, e.g., name and ID. It also has a
+field called `owner` which is the ID of the Param object that shares parameter
+values with others.
+
+### Functions
+The base Param class implements two sets of functions,
+
+    virtual void InitValues(int version = 0);  // initialize values according to `init_method`
+    void ShareFrom(const Param& other);  // share `data_` from `other` Param
+    --------------
+    virtual Msg* GenGetMsg(bool copy, int slice_idx);
+    virtual Msg* GenPutMsg(bool copy, int slice_idx);
+    ... // other message related functions.
+
+Besides the functions for processing the parameter values, there is a set of
+functions for generating and parsing messages. These messages are for
+transferring parameter values or gradients between workers and servers. Each
+message corresponds to one Param slice. If `copy` is false, it means the
+receiver of this message is in the same process as the sender. In such case,
+only pointers to the memory of parameter value (or gradient) are wrapped in
+the message; otherwise, the parameter values (or gradients) should be copied
+into the message.
+
+
+## Implementing Param subclass
+Users can extend the base Param class to implement their own parameter
+initialization methods and message transferring protocols. Similar to
+implementing a new Layer subclasses, users can create google protocol buffer
+messages for configuring the Param subclass. The subclass, denoted as FooParam
+should be registered in main.cc,
+
+    dirver.RegisterParam<FooParam>(kFooParam);  // kFooParam should be different to 0, which is for the base Param type
+
+
+  * type, an integer representing the `Param` type. Currently SINGA provides one
+    `Param` implementation with type 0 (the default type). If users want
+    to use their own Param implementation, they should extend the base Param
+    class and configure this field with `kUserParam`
+
+{% endcomment %}

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,95 @@
+# Programming Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` is for continuing the training from last
+[checkpoint](checkpoint.html).
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+  * initialize SINGA, e.g., setup logging.
+
+  * register user-defined components.
+
+  * create and pass the job configuration to SINGA driver
+
+
+An example main function is like
+
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer>(kFooLayer);
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Train(resume, jobConf);
+      return 0;
+    }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*.  To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/python.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/python.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/python.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/python.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,374 @@
+# Python Binding
+
+---
+
+Python binding provides APIs for configuring a training job following
+[keras](http://keras.io/), including the configuration of neural net, training
+algorithm, etc.  It replaces the configuration file (e.g., *job.conf*) in
+protobuf format, which is typically long and error-prone to prepare. In later
+version, we will add python functions to interact with the layer and neural net
+objects, which would enable users to train and debug their models
+interactively.
+
+Here is the layout of python related code,
+
+    SINGAROOT/tool/python
+    |-- pb2 (has job_pb2.py)
+    |-- singa
+        |-- model.py
+        |-- layer.py
+        |-- parameter.py
+        |-- initialization.py
+        |-- utils
+            |-- utility.py
+            |-- message.py
+    |-- examples
+        |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
+        |-- datasets
+            |-- cifar10.py
+            |-- mnist.py
+
+## Compiling and running instructions
+
+In order to use the Python APIs, users need to add the following arguments when compiling
+SINGA,
+
+    ./configure --enable-python --with-python=PYTHON_DIR
+    make
+
+where PYTHON_DIR has Python.h
+
+
+The training program is launched by
+
+    bin/singa-run.sh -exec <user_main.py>
+
+where user_main.py creates the JobProto object and passes it to Driver::Train to
+start the training.
+
+For example,
+
+    cd SINGAROOT
+    bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
+
+
+
+## Examples
+
+
+### MLP Example
+
+This example uses python APIs to configure and train a MLP model over the MNIST
+dataset. The configuration content is the same as that written in *SINGAROOT/examples/mnist/job.conf*.
+
+```
+X_train, X_test, workspace = mnist.load_data()
+
+m = Sequential('mlp', sys.argv)
+
+m.add(Dense(2500, init='uniform', activation='tanh'))
+m.add(Dense(2000, init='uniform', activation='tanh'))
+m.add(Dense(1500, init='uniform', activation='tanh'))
+m.add(Dense(1000, init='uniform', activation='tanh'))
+m.add(Dense(500,  init='uniform', activation='tanh'))
+m.add(Dense(10, init='uniform', activation='softmax'))
+
+sgd = SGD(lr=0.001, lr_type='step')
+topo = Cluster(workspace)
+m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
+```
+
+### CNN Example
+
+This example uses python APIs to configure and train a CNN model over the Cifar10
+dataset. The configuration content is the same as that written in *SINGAROOT/examples/cifar10/job.conf*.
+
+
+```
+X_train, X_test, workspace = cifar10.load_data()
+
+m = Sequential('cnn', sys.argv)
+
+m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
+m.add(MaxPooling2D(pool_size=(3,3), stride=2))
+m.add(Activation('relu'))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size=(3,3), stride=2))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution2D(64, 5, 1, 2))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size=(3,3), stride=2))
+
+m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))
+
+sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001))
+topo = Cluster(workspace)
+m.compile(updater=sgd, cluster=topo)
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
+```
+
+
+### RBM Example
+
+This example uses python APIs to configure and train a RBM model over the MNIST
+dataset. The configuration content is the same as that written in *SINGAROOT/examples/rbm*.conf*.
+
+```
+rbmid = 3
+X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
+m = Energy('rbm'+str(rbmid), sys.argv)
+
+out_dim = [1000, 500, 250]
+m.add(RBM(out_dim, w_std=0.1, b_wd=0))
+
+sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
+topo = Cluster(workspace)
+m.compile(optimizer=sgd, cluster=topo)
+m.fit(X_train, alg='cd', nb_epoch=6000)
+```
+
+### AutoEncoder Example
+This example uses python APIs to configure and train an autoencoder model over
+the MNIST dataset. The configuration content is the same as that written in
+*SINGAROOT/examples/autoencoder.conf*.
+
+
+```
+rbmid = 4
+X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
+m = Sequential('autoencoder', sys.argv)
+
+hid_dim = [1000, 500, 250, 30]
+m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True))
+
+agd = AdaGrad(lr=0.01)
+topo = Cluster(workspace)
+m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
+m.fit(X_train, alg='bp', nb_epoch=12200)
+```
+
+### To run SINGA on GPU
+
+Users need to set a list of gpu ids to `device` field in fit() or evaluate().
+The number of GPUs must be the same to the number of workers configured for
+cluster topology.
+
+
+```
+gpu_id = [0]
+m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
+```
+
+### TIPS
+
+Hidden layers for MLP can be configured as
+
+```
+for n in [2500, 2000, 1500, 1000, 500]:
+  m.add(Dense(n, init='uniform', activation='tanh'))
+m.add(Dense(10, init='uniform', activation='softmax'))
+```
+
+Activation layer can be specified separately
+
+```
+m.add(Dense(2500, init='uniform'))
+m.add(Activation('tanh'))
+```
+
+Users can explicitly specify hyper-parameters of weight and bias
+
+```
+par = Parameter(init='uniform', scale=0.05)
+m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
+```
+
+
+```
+parw = Parameter(init='gauss', std=0.0001)
+parb = Parameter(init='const', value=0)
+m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
+m.add(MaxPooling2D(pool_size(3,3), stride=2))
+m.add(Activation('relu'))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+parw.update(std=0.01)
+m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size(3,3), stride=2))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size(3,3), stride=2))
+
+m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax'))
+```
+
+
+Data can be added in this way,
+
+```
+X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
+m.fit(X_train, ...)                  // Data layer for training is added
+m.evaluate(X_test, ...)              // Data layer for testing is added
+```
+or this way,
+
+```
+X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
+m.add(X_train)                       // explicitly add Data layer
+m.add(X_test)                        // explicitly add Data layer
+```
+
+
+```
+store = Store(path='train.bin', batch_size=64, ...)        // parameter values are set explicitly
+m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added
+store = Store(path='test.bin', batch_size=100, ...)        // parameter values are set explicitly
+m.add(Data(load='recordinput', phase='test', conf=store))  // Data layer is added
+```
+
+
+### Cases to run SINGA
+
+(1) Run SINGA for training
+
+```
+m.fit(X_train, nb_epoch=1000)
+```
+
+(2) Run SINGA for training and validation
+
+```
+m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
+```
+
+(3) Run SINGA for test while training
+
+```
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, batch_size=100, test_steps=100)
+```
+
+(4) Run SINGA for test only
+Assume a checkpoint exists after training
+
+```
+result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0')
+```
+
+
+## Implementation Details
+
+### Layer class (inherited)
+
+* Data
+* Dense
+* Activation
+* Convolution2D
+* MaxPooling2D
+* AvgPooling2D
+* LRN2D
+* Dropout
+* RBM
+* Autoencoder
+
+### Model class
+
+Model class has `jobconf` (JobProto) and `layers` (layer list)
+
+Methods in Model class
+
+* add
+	* add Layer into Model
+	* 2 subclasses: Sequential model and Energy model
+
+* compile
+	* set Updater (i.e., optimizer) and Cluster (i.e., topology) components
+
+* fit
+	* set Training data and parameter values for the training
+		* (optional) set Validatiaon data and parameter values
+	* set Train_one_batch component
+	* specify `with_test` field if a user wants to run SINGA with test data simultaneously.
+	* [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc.
+
+* evaluate
+	* set Testing data and parameter values for the testing
+	* specify `checkpoint_path` field if a user want to run SINGA only for testing.
+	* [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.
+
+### Results
+
+fit() and evaluate() return train/test results, a dictionary containing
+
+* [key]: step number
+* [value]: a list of dictionay
+	* 'acc' for accuracy
+	* 'loss' for loss
+	* 'ppl' for ppl
+	* 'se' for squred error
+
+
+### Parameter class
+
+Users need to set parameter and initial values. For example,
+
+* Parameter (fields in Param proto)
+	* lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters.
+	* wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters.
+
+* Parameter initialization (fields in ParamGen proto)
+	* init = (string) // one of the types, 'uniform', 'constant', 'gaussian'
+	* high = (float)  // for 'uniform'
+	* low = (float)   // for 'uniform'
+	* value = (float) // for 'constant'
+	* mean = (float)  // for 'gaussian'
+	* std = (float)   // for 'gaussian'
+
+* Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default
+
+* Bias (`b_param`) is 'constant' with value=0 at default
+
+* How to update the parameter fields
+	* for updating Weight, put `w_` in front of field name
+	* for updating Bias, put `b_` in front of field name
+
+Several ways to set Parameter values
+
+```
+parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
+parb = Parameter(lr=1, wd=0, init='constant', value=0)
+m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
+```
+
+```
+m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
+```
+
+```
+parw = Parameter(init='constant', mean=0)
+m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
+```
+
+### Other classes
+
+* Store
+* Algorithm
+* Updater
+* SGD
+* AdaGrad
+* Cluster

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,187 @@
+# Quick Start
+
+---
+
+## SINGA setup
+
+Please refer to the
+[installation](installation.html) page
+for guidance on installing SINGA.
+
+### Starting Zookeeper
+
+SINGA uses [zookeeper](https://zookeeper.apache.org/) to coordinate the
+training.  Please make sure the zookeeper service is started before running
+SINGA.
+
+If you installed the zookeeper using our thirdparty script, you can
+simply start it by:
+
+    #goto top level folder
+    cd  SINGA_ROOT
+    ./bin/zk-service.sh start
+
+(`./bin/zk-service.sh stop` stops the zookeeper).
+
+Otherwise, if you launched a zookeeper by yourself but not used the
+default port, please edit the `conf/singa.conf`:
+
+    zookeeper_host: "localhost:YOUR_PORT"
+
+## Running in standalone mode
+
+Running SINGA in standalone mode is on the contrary of running it using cluster
+managers like [Mesos](http://mesos.apache.org/) or [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html).
+
+### Training on a single node
+
+For single node training, one process will be launched to run SINGA at
+local host. We train the [CNN model](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) over the
+[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) dataset as an example.
+The hyper-parameters are set following
+[cuda-convnet](https://code.google.com/p/cuda-convnet/). More details is
+available at [CNN example](cnn.html).
+
+
+#### Preparing data and job configuration
+
+Download the dataset and create the data shards for training and testing.
+
+    cd examples/cifar10/
+    cp Makefile.example Makefile
+    make download
+    make create
+
+A training dataset and a test dataset are created under *cifar10-train-shard*
+and *cifar10-test-shard* folder respectively. An *image_mean.bin* file is also
+generated, which contains the feature mean of all images.
+
+Since all code used for training this CNN model is provided by SINGA as
+built-in implementation, there is no need to write any code. Instead, users just
+execute the running script (*../../bin/singa-run.sh*) by providing the job
+configuration file (*job.conf*). To code in SINGA, please refer to the
+[programming guide](programming-guide.html).
+
+#### Training without parallelism
+
+By default, the cluster topology has a single worker and a single server.
+In other words, neither the training data nor the neural net is partitioned.
+
+The training is started by running:
+
+    # goto top level folder
+    cd ../../
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+
+You can list the current running jobs by,
+
+    ./bin/singa-console.sh list
+
+    JOB ID    |NUM PROCS
+    ----------|-----------
+    24        |1
+
+Jobs can be killed by,
+
+    ./bin/singa-console.sh kill JOB_ID
+
+
+Logs and job information are available in */tmp/singa-log* folder, which can be
+changed to other folders by setting `log-dir` in *conf/singa.conf*.
+
+
+#### Asynchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworker_groups: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](architecture.html) is enabled by launching
+multiple worker groups. For example, we can change the original *job.conf* to
+have two worker groups as shown above. By default, each worker group has one
+worker. Since one process is set to contain two workers.  The two worker groups
+will run in the same process.  Consequently, they run the in-memory
+[Downpour](frameworks.html) training framework.  Users do not need to split the
+dataset explicitly for each worker (group); instead, they can assign each
+worker (group) a random offset to the start of the dataset. The workers would
+run as on different data partitions.
+
+    # job.conf
+    ...
+    neuralnet {
+      layer {
+        ...
+        sharddata_conf {
+          random_skip: 5000
+        }
+      }
+      ...
+    }
+
+The running command is:
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+#### Synchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworkers_per_group: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](architecture.html) is enabled
+by launching multiple workers within one worker group. For instance, we can
+change the original *job.conf* to have two workers in one worker group as shown
+above. The workers will run synchronously
+as they are from the same worker group. This framework is the in-memory
+[sandblaster](frameworks.html).
+The model is partitioned among the two workers. In specific, each layer is
+sliced over the two workers.  The sliced layer
+is the same as the original layer except that it only has `B/g` feature
+instances, where `B` is the number of instances in a mini-batch, `g` is the number of
+workers in a group. It is also possible to partition the layer (or neural net)
+using [other schemes](neural-net.html).
+All other settings are the same as running without partitioning
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+### Training in a cluster
+
+We can extend the above two training frameworks to a cluster by updating the
+cluster configuration with:
+
+    nworker_per_procs: 1
+
+Every process would then create only one worker thread. Consequently, the workers
+would be created in different processes (i.e., nodes). The *hostfile*
+must be provided under *SINGA_ROOT/conf/* specifying the nodes in the cluster,
+e.g.,
+
+    logbase-a01
+    logbase-a02
+
+And the zookeeper location must be configured correctly, e.g.,
+
+    #conf/singa.conf
+    zookeeper_host: "logbase-a01"
+
+The running command is the same as for single node training:
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+## Running with Mesos
+
+*working*...
+
+## Where to go next
+
+The [programming guide](programming-guide.html) pages will
+describe how to submit a training job in SINGA.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,365 @@
+# RBM Example
+
+---
+
+This example uses SINGA to train 4 RBM models and one auto-encoder model over the
+[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
+to reduce the dimensionality of the MNIST image feature. The RBM models are trained
+to initialize parameters of the auto-encoder model. This example application is
+from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).
+
+## Running instructions
+
+Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.
+
+The MNIST dataset has 70,000 handwritten digit images. The
+[data preparation](data.html) page
+has details on converting this dataset into SINGA recognizable format. Users can
+simply run the following commands to download and convert the dataset.
+
+    # at SINGA_ROOT/examples/mnist/
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+The training is separated into two phases, namely pre-training and fine-tuning.
+The pre-training phase trains 4 RBMs in sequence,
+
+    # at SINGA_ROOT/
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf
+
+The fine-tuning phase trains the auto-encoder by,
+
+    $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf
+
+
+## Training details
+
+### RBM1
+
+<img src="../images/example-rbm1.png" align="center" width="200px"/>
+<span><strong>Figure 1 - RBM1.</strong></span>
+
+The neural net structure for training RBM1 is shown in Figure 1.
+The data layer and parser layer provides features for training RBM1.
+The visible layer (connected with parser layer) of RBM1 accepts the image feature
+(784 dimension). The hidden layer is set to have 1000 neurons (units).
+These two layers are configured as,
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"mnist"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1"
+        init{
+          type: kGaussian
+          mean: 0.0
+          std: 0.1
+        }
+      }
+      param{
+        name: "b11"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1_"
+        share_from: "w1"
+      }
+      param{
+        name: "b12"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+
+
+For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
+`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
+the `share_from` field to enable [parameter sharing](param.html)
+as shown above for the param `w1` and `w1_`.
+
+[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
+is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
+Following Hinton's paper, we configure the [updating protocol](updater.html)
+as follows,
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.2
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.1
+        type: kFixed
+      }
+    }
+
+Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
+configure the `workspace` field to specify a path for the checkpoint folder.
+For example, if we configure it as,
+
+    cluster {
+      workspace: "examples/rbm/rbm1/"
+    }
+
+Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*.
+
+### RBM1
+<img src="../images/example-rbm2.png" align="center" width="200px"/>
+<span><strong>Figure 2 - RBM2.</strong></span>
+
+Figure 2 shows the net structure of training RBM2.
+The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
+is a  `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
+from RBM1.
+The neural net configuration is (with layers for data layer and parser layer omitted).
+
+    layer{
+      name: "Inner1"
+      type: kInnerProduct
+      srclayers:"mnist"
+      innerproduct_conf{
+        num_output: 1000
+      }
+      param{ name: "w1" }
+      param{ name: "b12"}
+    }
+
+    layer{
+      name: "Sigmoid1"
+      type: kSigmoid
+      srclayers:"Inner1"
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid1"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2"
+        ...
+      }
+      param{
+        name: "b21"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2_"
+        share_from: "w2"
+      }
+      param{
+        name: "b22"
+        ...
+      }
+    }
+
+To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,
+
+    checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
+    cluster{
+      workspace: "examples/rbm/rbm2"
+    }
+
+The workspace is changed for checkpointing `w2`, `b21` and `b22` into
+*examples/rbm/rbm2/*.
+
+### RBM3
+
+<img src="../images/example-rbm3.png" align="center" width="200px"/>
+<span><strong>Figure 3 - RBM3.</strong></span>
+
+Figure 3 shows the net structure of training RBM3. In this model, a layer with
+250 units is added as the hidden layer of RBM3. The visible units of RBM3
+accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
+`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
+i.e., "examples/rbm/rbm2/".
+
+### RBM4
+
+
+<img src="../images/example-rbm4.png" align="center" width="200px"/>
+<span><strong>Figure 4 - RBM4.</strong></span>
+
+Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
+but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
+top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
+Gaussian whose mean is determined by the input from the RBM's logistic visible
+units. So we add a `gaussian` field in the RBMHid layer to control the
+sampling distribution (Gaussian or Bernoulli). In addition, this
+RBM has a much smaller learning rate (0.001).  The neural net configuration for
+the RBM4 and the updating protocol is (with layers for data layer and parser
+layer omitted),
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.9
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.001
+        type: kFixed
+      }
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid3"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 30
+      }
+      param{
+        name: "w4"
+        ...
+      }
+      param{
+        name: "b41"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 30
+        gaussian: true
+      }
+      param{
+        name: "w4_"
+        share_from: "w4"
+      }
+      param{
+        name: "b42"
+        ...
+      }
+    }
+
+### Auto-encoder
+In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
+networks that are initialized using the parameters from the previous 4 RBMs.
+
+<img src="../images/example-autoencoder.png" align="center" width="500px"/>
+<span><strong>Figure 5 - Auto-Encoders.</strong></span>
+
+
+Figure 5 shows the neural net structure for training the auto-encoder.
+[Back propagation (kBP)] (train-one-batch.html) is
+configured as the algorithm for `TrainOneBatch`. We use the same cluster
+configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
+fixed learning rate.
+
+    ### Updater Configuration
+    updater{
+      type: kAdaGrad
+      learning_rate{
+      base_lr: 0.01
+      type: kFixed
+      }
+    }
+
+
+
+According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
+we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
+configuration is (with some of the middle layers omitted),
+
+    layer{ name: "data" }
+    layer{ name:"mnist" }
+    layer{
+      name: "Inner1"
+      param{ name: "w1" }
+      param{ name: "b12" }
+    }
+    layer{ name: "Sigmoid1" }
+    ...
+    layer{
+      name: "Inner8"
+      innerproduct_conf{
+        num_output: 784
+        transpose: true
+      }
+      param{
+        name: "w8"
+        share_from: "w1"
+      }
+      param{ name: "b11" }
+    }
+    layer{ name: "Sigmoid8" }
+
+    # Euclidean Loss Layer Configuration
+    layer{
+      name: "loss"
+      type:kEuclideanLoss
+      srclayers:"Sigmoid8"
+      srclayers:"mnist"
+    }
+
+To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as
+
+    ### Checkpoint Configuration
+    checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"
+
+
+## Visualization Results
+
+<div>
+<img src="../images/rbm-weight.PNG" align="center" width="300px"/>
+
+<img src="../images/rbm-feature.PNG" align="center" width="300px"/>
+<br/>
+<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
+&nbsp;
+&nbsp;
+&nbsp;
+&nbsp;
+
+<span><strong>Figure 7 - Top layer features.</strong></span>
+</div>
+
+Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
+Gabor-like filters are learned. Figure 7 depicts the features extracted from
+the top-layer of the auto-encoder, wherein one point represents one image.
+Different colors represent different digits. We can see that most images are
+well clustered according to the ground truth.