You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2016/04/20 07:09:07 UTC
svn commit: r1740048 [4/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/overview.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/overview.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/overview.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/overview.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,68 @@
+# ã¤ã³ãããã¯ã·ã§ã³
+
+---
+
+SINGAã¯ãå¤§è¦æ¨¡ãªãã¼ã¿åæã®çºã®ãã£ã¼ãã©ã¼ãã³ã°ã¢ãã«ã®ãã¬ã¼ãã³ã°ãç®çã¨ãããåæ£ãã£ã¼ãã©ã¼ãã³ã°ã»ãã©ãããã©ã¼ã ãã§ãã
+ã¢ãã«ã¨ãªããããã¯ã¼ã¯ã®ãã¬ã¤ã¤ã¼ãæ¦å¿µã«åºã¥ããç´æçã«ããã°ã©ãã³ã°ã§ãããããã¶ã¤ã³ããã¦ãã¾ãã
+
+* Convolutional Neural Network (ç³ã¿è¾¼ã¿ãã¥ã¼ã©ã«ãããã¯ã¼ã¯) ã®ãããªãã£ã¼ããã©ã¯ã¼ããããã¯ã¼ã¯ããRestricted Boltzmann Machine (å¶éãã«ããã³ãã·ã³) ã®ãããªã¨ãã«ã®ã¼ã¢ãã«ãRecurrent Neural Network (åå¸°åãã¥ã¼ã©ã«ãããã¯ã¼ã¯) ã¢ãã«çãå¤æ§ãªã¢ãã«ããµãã¼ããã¾ãã
+
+* æ§ããªãã¬ã¤ã¤ã¼ãããBuilt-in Layer ã¨ãã¦å®è£ããã¦ãã¾ãã
+
+* SINGAã®ã¢ã¼ããã¯ãã£ã¼ã¯ãsynchronous (åæ)ãasynchronous (éåæ)ãããã¦ hybrid (ãã¤ããªãã) ãã¬ã¼ãã³ã°ãå®è¡ã§ããããè¨è¨ããã¦ãã¾ãã
+
+* å¤§è¦æ¨¡ãªã¢ãã«ã®ãã¬ã¼ãã³ã°ãä¸¦ååããããã®ããã¥ã¼ã©ã«ãããã¯ã¼ã¯åå²ã¹ãã¼ã  (ããããç¹å¾´ã®åå²) ããµãã¼ããã¦ãã¾ãã
+
+
+## ç®ç
+
+æ¡å¼µæ§ï¼åæ£ã·ã¹ãã ã¨ãã¦ãããå¤ãã®ãªã½ã¼ã¹ãå©ç¨ããç¹å®ã®ç²¾åº¦ã«éããã¾ã§ã®ãã¬ã¼ãã³ã°ã¹ãã¼ããåä¸ãããã
+
+ä½¿ããããï¼å¤§è¦æ¨¡ãªåæ£ã¢ãã«ã®å¹ççãªãã¬ã¼ãã³ã°ã«å¿è¦ã¨ãªãããã¼ã¿ãã¢ãã«ã®åå²ããããã¯ã¼ã¯éä¿¡çãããã°ã©ãã¼ã«ã¨ã£ã¦æéã®ãããä½æ¥ãç°¡ç¥åãããã£ã¼ãã§è¤éãªã¢ãã«ãã¢ã«ã´ãªãºã ã®å®è£ãå®¹æã«ããã
+
+
+## è¨è¨çå¿µ
+
+ã¹ã±ã¼ã©ããªãã£ã¯ãåæ£ãã£ã¼ãã©ã¼ãã³ã°ã«ããã¦éè¦ãªç ç©¶èª²é¡ã§ãã
+SINGAã¯ãæ§ããªãã¬ã¼ãã³ã°ãã¬ã¼ã ã¯ã¼ã¯ã®ã¹ã±ã¼ã©ããªãã£ã¼ãä¿ã¦ãããããè¨è¨ããã¦ãã¾ãã
+* Synchronous (åæ)ï¼ãã¬ã¼ãã³ã°ã®ï¼ã¹ãããã§å¾ãããå¹æãé«ãã¾ãã
+* Asynchronous (éåæ)ï¼ãã¬ã¼ãã³ã°ã®åæçãé«ãã¾ãã
+* Hybrid (ãã¤ããªãã)ï¼ã³ã¹ãããªã½ã¼ã¹(ã¯ã©ã¹ã¿ã¼ãµã¤ãºç)ã«åã£ãå¹æã¨åæçã®ãã©ã³ã¹ãã¨ããã¹ã±ã¼ã©ããªãã£ã¼ãé«ãã¾ãã
+
+SINGAã¯ããã£ã¼ãã©ã¼ãã³ã°ã¢ãã«ã®ãããã¯ã¼ã¯ã®ãã¬ã¤ã¤ã¼ãæ¦å¿µã«åºã¥ããç´æçã«ããã°ã©ãã³ã°åºæ¥ããããã¶ã¤ã³ããã¦ãã¾ããæ§ããªã¢ãã«ãå®è£ããã¬ã¼ãã³ã°ã§ãã¾ãã
+
+## ã·ã¹ãã æ¦è¦
+
+<img src="../../images/sgd.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SGD ããã¼</strong></span>
+
+ããã£ã¼ãã©ã¼ãã³ã°ã¢ãã«ããã¬ã¼ãã³ã°ãããã¨ã¯ã
+ç¹å®ã®ã¿ã¹ã¯(åé¡ãäºæ¸¬ç)ãéæããããã«ä½¿ãããç¹å¾´éãçæããå¤æé¢æ°ã®æé©ãªãã©ã¡ã¿ã¼ãæ¢ããã¨ã§ãã
+ãã©ã¡ã¿ã¼ã®è¯ãæªãã¯ã[Cross-Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy) çã® loss function (æå¤±é¢æ°)ã§å¤æãã¾ãããã®é¢æ°ã¯ä¸è¬çã«éç·å½¢ãã¾ãã¯éå¸é¢æ°ãªã®ã§ãéè§£ãæ¢ãã®ãé£ããã§ãã
+
+ããã§ãStochastic Gradient Descent (ç¢ºççå¾ééä¸æ³) ãå©ç¨ãã¾ãã
+Figure 1 ã®ããã«ãã©ã³ãã ã«åæåããããã©ã¡ã¼ã¿ã¼ã®å¤ããæå¤±é¢æ°ãå°ãããªãããç¹°ãè¿ãã¢ãããã¼ããã¦ããã¾ãã
+
+<img src="../../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 2 - SINGA æ¦è¦</strong></span>
+
+ãã¬ã¼ãã³ã°ã«è¦ããã¯ã¼ã¯ãã¼ãã¯ãworkers ã¨ servers ã«åæ£ããã¾ããFigure 2 ã®ããã«ãã«ã¼ãæ¯ã« workers ã¯ *TrainOneBatch* é¢æ°ãå¼ã³ããã©ã¡ã¼ã¿ã¼å¾éãè¨ç®ãã¾ãã
+*TrainOneBatch* ã¯ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®æ§é ãè¨è¿°ããã *NeuralNet* ãªãã¸ã§ã¯ãã«åºã¥ãã¦ããã¬ã¤ã¤ã¼ããé ã«è¦ã¦åãã¾ãã
+è¨ç®ãããå¾éã¯ãã¼ã«ã«ãã¼ãã® stub ã«éãããéè¨ãããå¾ãå¯¾å¿ãã servers ã«éããã¾ããServers ã¯ãã¢ãããã¼ãããããã©ã¡ã¼ã¿ã¼ã workers ã«éãè¿ããæ¬¡ã®ã«ã¼ããå®è¡ãã¾ãã
+
+
+## ã¸ã§ã
+
+SINGA ã®ãã¸ã§ããã¨ã¯ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã¢ãã«ã¨ãã¼ã¿ããã¬ã¼ãã³ã°æ¹æ³ãã¯ã©ã¹ã¿ã¼ãããã¸ã¼çãè¨è¿°ãããããJob Configurationããæãã¾ãã
+Job Configurationã¯ãFigure 2ã«æãããæ¬¡ã®4ã¤ã®ã³ã³ããã³ããç¹å®ãã¾ãã
+
+  * [NeuralNet](neural-net.html)ï¼ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®æ§é ã¨ãåãã¬ã¤ã¤ã¼ãã®è¨å®ãè¨è¿°ãã¾ãã
+  * [TrainOneBatch](train-one-batch.html)ï¼ç°ãªãã¢ãã«ã«ãã´ãªã«é©ããã¢ã«ã´ãªãºã ãè¨è¿°ãã¾ãã
+  * [Updater](updater.html)ï¼ ãã©ã¡ã¼ã¿ã¼ã®ã¢ãããã¼ãæ¹æ³ãè¨è¿°ãã¾ãã
+  * [Cluster Topology](distributed-training.html)ï¼workers ã¨ servers ã®åæ£ãããã¸ã¼ãè¨è¿°ãã¾ãã
+
+[main é¢æ°](programming-guide.html)ã® SINGA ãã©ã¤ãã«ã¸ã§ããæ¸¡ãã¾ãã
+
+ãã®ããã»ã¹ã¯ãHadoopã§ã®ã¸ã§ããµãããã·ã§ã³ã«ä¼¼ã¦ãã¾ãã
+ã¦ã¼ã¶ã¼ã mainé¢æ°åã§ã¸ã§ãã®è¨å®ããã¾ãã
+Hadoopã¦ã¼ã¶ã¼ã¯ãèªèº«ã® mapper ã reducer ãè¨å®ãã¾ãããSINGAã¦ã¼ã¶ã¼ã¯ãèªèº«ã®ãã¬ã¤ã¤ã¼ããUpdaterçãè¨å®ãã¾ãã

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/param.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/param.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/param.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/param.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,226 @@
+# Parameters
+
+---
+
+A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
+or a bias vector. *Basic user guide* describes how to configure for a `Param`
+object, and *Advanced user guide* provides details on implementing users'
+parameter initialization methods.
+
+## Basic user guide
+
+The configuration of a Param object is inside a layer configuration, as the
+`Param` are associated with layers. An example configuration is like
+
+    layer {
+      ...
+      param {
+        name : "p1"
+        init {
+          type : kConstant
+          value: 1
+        }
+      }
+    }
+
+The [SGD algorithm](overview.html) starts with initializing all
+parameters according to user specified initialization method (the `init` field).
+For the above example,
+all parameters in `Param` "p1" will be initialized to constant value 1. The
+configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html):
+
+  * name, an identifier string. It is an optional field. If not provided, SINGA
+  will generate one based on layer name and its order in the layer.
+  * init, field for setting initialization methods.
+  * share_from, name of another `Param` object, from which this `Param` will share
+  configurations and values.
+  * lr_scale, float value to be multiplied with the learning rate when
+  [updating the parameters](updater.html)
+  * wd_scale, float value to be multiplied with the weight decay when
+  [updating the parameters](updater.html)
+
+There are some other fields that are specific to initialization methods.
+
+### Initialization methods
+
+Users can set the `type` of `init` use the following built-in initialization
+methods,
+
+  * `kConst`, set all parameters of the Param object to a constant value
+
+        type: kConst
+        value: float  # default is 1
+
+  * `kGaussian`, initialize the parameters following a Gaussian distribution.
+
+        type: kGaussian
+        mean: float # mean of the Gaussian distribution, default is 0
+        std: float # standard variance, default is 1
+        value: float # default 0
+
+  * `kUniform`, initialize the parameters following an uniform distribution
+
+        type: kUniform
+        low: float # lower boundary, default is -1
+        high: float # upper boundary, default is 1
+        value: float # default 0
+
+  * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
+  matrix) using `kGaussian` and then
+  multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
+  columns of the matrix.
+
+  * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
+  distribution is an uniform distribution.
+
+  * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
+  multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
+  fan_out` sums up the number of columns and rows of the matrix.
+
+For all above initialization methods except `kConst`, if their `value` is not
+1, every parameter will be multiplied with `value`. Users can also implement
+their own initialization method following the *Advanced user guide*.
+
+
+## Advanced user guide
+
+This sections describes the details on implementing new parameter
+initialization methods.
+
+### Base ParamGenerator
+All initialization methods are implemented as
+subclasses of the base `ParamGenerator` class.
+
+    class ParamGenerator {
+     public:
+      virtual void Init(const ParamGenProto&);
+      void Fill(Param*);
+
+     protected:
+      ParamGenProto proto_;
+    };
+
+Configurations of the initialization method is in `ParamGenProto`. The `Fill`
+function fills the `Param` object (passed in as an argument).
+
+### New ParamGenerator subclass
+
+Similar to implement a new Layer subclass, users can define a configuration
+protocol message,
+
+    # in user.proto
+    message FooParamProto {
+      optional int32 x = 1;
+    }
+    extend ParamGenProto {
+      optional FooParamProto fooparam_conf =101;
+    }
+
+The configuration of `Param` would be
+
+    param {
+      ...
+      init {
+        user_type: 'FooParam" # must use user_type for user defined methods
+        [fooparam_conf] { # must use brackets for configuring user defined messages
+          x: 10
+        }
+      }
+    }
+
+The subclass could be declared as,
+
+    class FooParamGen : public ParamGenerator {
+     public:
+      void Fill(Param*) override;
+    };
+
+Users can access the configuration fields in `Fill` by
+
+    int x = proto_.GetExtension(fooparam_conf).x();
+
+To use the new initialization method, users need to register it in the
+[main function](programming-guide.html).
+
+    driver.RegisterParamGenerator<FooParamGen>("FooParam")  # must be consistent with the user_type in configuration
+
+{% comment %}
+### Base Param class
+
+### Members
+
+    int local_version_;
+    int slice_start_;
+    vector<int> slice_offset_, slice_size_;
+
+    shared_ptr<Blob<float>> data_;
+    Blob<float> grad_;
+    ParamProto proto_;
+
+Each Param object has a local version and a global version (inside the data
+Blob). These two versions are used for synchronization. If multiple Param
+objects share the same values, they would have the same `data_` field.
+Consequently, their global version is the same. The global version is updated
+by [the stub thread](communication.html). The local version is
+updated in `Worker::Update` function which assigns the global version to the
+local version. The `Worker::Collect` function is blocked until the global
+version is larger than the local version, i.e., when `data_` is updated. In
+this way, we synchronize workers sharing parameters.
+
+In Deep learning models, some Param objects are 100 times larger than others.
+To ensure the load-balance among servers, SINGA slices large Param objects. The
+slicing information is recorded by `slice_*`. Each slice is assigned a unique
+ID starting from 0. `slice_start_` is the ID of the first slice of this Param
+object. `slice_offset_[i]` is the offset of the i-th slice in this Param
+object. `slice_size_[i]` is the size of the i-th slice. These slice information
+is used to create messages for transferring parameter values or gradients to
+different servers.
+
+Each Param object has a `grad_` field for gradients. Param objects do not share
+this Blob although they may share `data_`.  Because each layer containing a
+Param object would contribute gradients. E.g., in RNN, the recurrent layers
+share parameters values, and the gradients used for updating are averaged from all recurrent
+these recurrent layers. In SINGA, the stub thread will aggregate local
+gradients for the same Param object. The server will do a global aggregation
+of gradients for the same Param object.
+
+The `proto_` field has some meta information, e.g., name and ID. It also has a
+field called `owner` which is the ID of the Param object that shares parameter
+values with others.
+
+### Functions
+The base Param class implements two sets of functions,
+
+    virtual void InitValues(int version = 0);  // initialize values according to `init_method`
+    void ShareFrom(const Param& other);  // share `data_` from `other` Param
+    --------------
+    virtual Msg* GenGetMsg(bool copy, int slice_idx);
+    virtual Msg* GenPutMsg(bool copy, int slice_idx);
+    ... // other message related functions.
+
+Besides the functions for processing the parameter values, there is a set of
+functions for generating and parsing messages. These messages are for
+transferring parameter values or gradients between workers and servers. Each
+message corresponds to one Param slice. If `copy` is false, it means the
+receiver of this message is in the same process as the sender. In such case,
+only pointers to the memory of parameter value (or gradient) are wrapped in
+the message; otherwise, the parameter values (or gradients) should be copied
+into the message.
+
+
+## Implementing Param subclass
+Users can extend the base Param class to implement their own parameter
+initialization methods and message transferring protocols. Similar to
+implementing a new Layer subclasses, users can create google protocol buffer
+messages for configuring the Param subclass. The subclass, denoted as FooParam
+should be registered in main.cc,
+
+    dirver.RegisterParam<FooParam>(kFooParam);  // kFooParam should be different to 0, which is for the base Param type
+
+
+  * type, an integer representing the `Param` type. Currently SINGA provides one
+    `Param` implementation with type 0 (the default type). If users want
+    to use their own Param implementation, they should extend the base Param
+    class and configure this field with `kUserParam`
+
+{% endcomment %}

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programmer-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programmer-guide.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programmer-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programmer-guide.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,97 @@
+# Programmer Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided by SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../../api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume] [-test]
+
+* `-resume` is for continuing the training from last [checkpoint](checkpoint.html).
+* `-test` is for testing the performance of previously trained model and extracting features for new data,
+more details are available [here](test.html).
+
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+  * initialize SINGA, e.g., setup logging.
+
+  * register user-defined components.
+
+  * create and pass the job configuration to SINGA driver
+
+An example main function is like
+
+    #include <string>
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer, std::string>("kFooLayer");
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater, std::string>("kFooUpdater");
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Submit(resume, jobConf);
+      return 0;
+    }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*.  To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programming-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programming-guide.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programming-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programming-guide.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,95 @@
+# Programming Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` is for continuing the training from last
+[checkpoint](checkpoint.html).
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+  * initialize SINGA, e.g., setup logging.
+
+  * register user-defined components.
+
+  * create and pass the job configuration to SINGA driver
+
+
+An example main function is like
+
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer>(kFooLayer);
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Train(resume, jobConf);
+      return 0;
+    }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*.  To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/quick-start.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/quick-start.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/quick-start.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/quick-start.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,177 @@
+# ã¯ã¤ãã¯ ã¹ã¿ã¼ã
+
+---
+
+## SINGA ã»ããã¢ãã
+
+SINGAã®ã¤ã³ã¹ãã¼ã«ã«ã¤ãã¦ã¯[ãã¡ã](installation.html)ããè¦§ãã ããã
+
+### Zookeeper ã®å®è¡
+
+SINGAã®ãã¬ã¼ãã³ã°ã¯ã[zookeeper](https://zookeeper.apache.org/) ãå©ç¨ãã¾ããã¾ãã¯ zookeeper ãµã¼ãã¹ãéå§ããã¦ãããã¨ãç¢ºèªãã¦ãã ããã
+
+æºåããã thirdparty ã®ã¹ã¯ãªãããä½¿ã£ã¦ zookeeper ãã¤ã³ã¹ãã¼ã«ããå ´åãæ¬¡ã®ã¹ã¯ãªãããå®è¡ãã¦ãã ããã
+
+    #goto top level folder
+    cd  SINGA_ROOT
+    ./bin/zk-service.sh start
+
+(`./bin/zk-service.sh stop` // zookeeper ã®åæ¢).
+
+ããã©ã«ãã®ãã¼ããä½¿ç¨ããã« zookeeper ãã¹ã¿ã¼ããããæã¯ã`conf/singa.conf`ãç·¨éãã¦ãã ããã
+
+    zookeeper_host: "localhost:YOUR_PORT"
+
+## ã¹ã¿ã³ãã¢ãã¼ã³ã¢ã¼ãã§ã®å®è¡
+
+ã¹ã¿ã³ãã¢ãã¼ã³ã¢ã¼ãã§SINGAãå®è¡ããã¨ã¯ã[Mesos](http://mesos.apache.org/) ã [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) ã®ãããªã¯ã©ã¹ã¿ã¼ããã¼ã¸ã£ã¼å©ç¨ããªãå ´åã®ãã¨ãè¨ãã¾ãã
+
+### Single ãã¼ãã§ã®ãã¬ã¼ãã³ã°
+
+ï¼ã¤ã®ããã»ã¹ããã¼ã³ãããã¾ãã
+ä¾ã¨ãã¦ã
+[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) ãã¼ã¿ã»ãããå©ç¨ãã¦
+[CNN ã¢ãã«](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) ããã¬ã¼ãã³ã°ããã¾ãã
+ãã¤ãã¼ãã©ã¡ã¼ã¿ã¼ã¯ã[cuda-convnet](https://code.google.com/p/cuda-convnet/) ã«åºã¥ãã¦è¨å®ããã¦ããã¾ãã
+è©³ç´°ã¯ã[CNN ãµã³ãã«](cnn.html) ã®ãã¼ã¸ããè¦§ãã ããã
+
+
+#### ãã¼ã¿ã¨ãã¸ã§ãè¨å®
+
+ãã¼ã¿ã»ããã®ãã¦ã³ãã¼ãã¨ãTriaing ã Test ã®ããã®ãã¼ã¿ã·ã£ã¼ãã®çæã¯æ¬¡ã®ããã«è¡ãã¾ãã
+
+    cd examples/cifar10/
+    cp Makefile.example Makefile
+    make download
+    make create
+
+Training ã¨ Test ãã¼ã¿ã»ããã¯ããããã *cifar10-train-shard*
+ã¨ *cifar10-test-shard* ãã©ã«ãã¼ã«ä½ããã¾ããããã¹ã¦ã®ç»åã®ç¹å¾´å¹³åãè¨è¿°ãã *image_mean.bin* ãã¡ã¤ã«ãä½æããã¾ãã
+
+CNN ã¢ãã«ã®ãã¬ã¼ãã³ã°ã«å¿è¦ãªã½ã¼ã¹ã³ã¼ãã¯ãã¹ã¦SINGAã«çµã¿è¾¼ã¾ãã¦ãã¾ããã³ã¼ããè¿½å ããå¿è¦ã¯ããã¾ããã
+ã¸ã§ãè¨å®ãã¡ã¤ã« (*job.conf*) ãæå®ãã¦ãã¹ã¯ãªãã(*../../bin/singa-run.sh*) ãå®è¡ãã¾ãã
+SINGAã®ã³ã¼ããå¤æ´ãã¾ãã¯è¿½å ããæã¯ã[ããã°ã©ãã³ã°ã¬ã¤ã](programming-guide.html)ããè¦§ãã ããã
+
+#### ä¸¦ååãªãã®ãã¬ã¼ãã³ã°
+
+Cluster Topology ã®ããã©ã«ãå¤ã¯ãï¼ã¤ã® worker ã¨ãï¼ã¤ã® server ã¨ãªã£ã¦ãã¾ãã
+ãã¼ã¿ã¨ãã¥ã¼ã©ã«ãããã®ä¸¦ååã¯ããã¾ããã
+
+ãã¬ã¼ãã³ã°ãéå§ããã«ã¯æ¬¡ã®ã¹ã¯ãªãããå®è¡ãã¾ãã
+
+    # goto top level folder
+    cd ../../
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+
+ç¾å¨ãèµ·åä¸ã®ã¸ã§ãã®ãªã¹ããè¡¨ç¤ºããã«ã¯
+
+    ./bin/singa-console.sh list
+
+    JOB ID    |NUM PROCS
+    ----------|-----------
+    24        |1
+
+ã¸ã§ãã®å¼·å¶çµäºãããã«ã¯
+
+    ./bin/singa-console.sh kill JOB_ID
+
+
+ãã°ã¨ã¸ã§ãã®æå ±ã¯ */tmp/singa-log* ãã©ã«ãã¼ã«ä¿åããã¾ãã
+*conf/singa.conf* ãã¡ã¤ã«ã® `log-dir`ã§å¤æ´å¯è½ã§ãã
+
+
+#### éåæãä¸¦åãã¬ã¼ãã³ã°
+
+    # job.conf
+    ...
+    cluster {
+      nworker_groups: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+è¤æ°ã® worker ã°ã«ã¼ãããã¼ã³ããããã¨ã«ãã£ã¦ã
+In SINGA, [éåæãã¬ã¼ãã³ã°](architecture.html) ãå®è¡ãããã¨ãåºæ¥ã¾ãã
+ä¾ãã°ã*job.conf* ãä¸è¨ã®ããã«å¤æ´ãã¾ãã
+ããã©ã«ãã§ã¯ãï¼ã¤ã® worker ã°ã«ã¼ããï¼ã¤ã® worker ãæã¤ããè¨å®ããã¦ãã¾ãã
+ä¸è¨ã®è¨å®ã§ã¯ãï¼ã¤ã®ããã»ã¹ã«ï¼ã¤ã® worker ãè¨å®ããã¦ããã®ã§ãï¼ã¤ã® worker ã°ã«ã¼ããåãããã»ã¹ã¨ãã¦å®è¡ããã¾ãã
+çµæãã¤ã³ã¡ã¢ãª [Downpour](frameworks.html) ãã¬ã¼ãã³ã°ãã¬ã¼ã ã¯ã¼ã¯ã¨ãã¦ãå®è¡ããã¾ãã
+
+ã¦ã¼ã¶ã¼ã¯ããã¼ã¿ã®åæ£ãæ°ã«ããå¿è¦ã¯ããã¾ããã
+ã©ã³ãã ãªãã»ããã«å¾ããå worker ã°ã«ã¼ãã«ããã¼ã¿ãæ¯ãåãããã¾ãã
+å worker ã¯ç°ãªããã¼ã¿ãã¼ãã£ã·ã§ã³ãæå½ãã¾ãã
+
+    # job.conf
+    ...
+    neuralnet {
+      layer {
+        ...
+        sharddata_conf {
+          random_skip: 5000
+        }
+      }
+      ...
+    }
+
+ã¹ã¯ãªããå®è¡:
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+#### åæãä¸¦åãã¬ã¼ãã³ã°
+
+    # job.conf
+    ...
+    cluster {
+      nworkers_per_group: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+ï¼ã¤ã®workerã°ã«ã¼ãã¨ãã¦è¤æ°ã®workerããã¼ã³ããããã¨ã§ [åæãã¬ã¼ãã³ã°](architecture.html)ãå®è¡ãããã¨ãåºæ¥ã¾ãã
+ä¾ãã°ã*job.conf* ãã¡ã¤ã«ãä¸è¨ã®ããã«å¤æ´ãã¾ãã
+ä¸è¨ã®è¨å®ã§ã¯ãï¼ã¤ã® worker ã°ã«ã¼ãã«ï¼ã¤ã® worker ãè¨å®ããã¾ããã
+worker éã¯ã°ã«ã¼ãåã§åæãã¾ãã
+ããã¯ãã¤ã³ã¡ã¢ãª [sandblaster](frameworks.html) ã¨ãã¦å®è¡ããã¾ãã
+ã¢ãã«ã¯ï¼ã¤ã®workerã«åå²ããã¾ããåã¬ã¤ã¤ã¼ãï¼ã¤ã®workerã«æ¯ãåãããã¾ãã
+æ¯ãåããããã¬ã¤ã¤ã¼ã¯ãªãªã¸ãã«ã®ã¬ã¤ã¤ã¼ã¨æ©è½ã¯åãã§ãããç¹å¾´ã¤ã³ã¹ã¿ã³ã¹ã®æ°ã `B/g` ã«ãªãã¾ãã
+ããã§ã`B`ã¯ãããããã®ã¤ã³ã¹ã¿ã³ã¹ã®æ°ã§ã`g`ã¯ã°ã«ã¼ãåã® worker ã®æ°ã§ãã
+[å¥ã®ã¹ãã¼ã ](neural-net.html) ãå©ç¨ããã¬ã¤ã¤ã¼ï¼ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ï¼ãã¼ãã£ã·ã§ã³æ¹æ³ãããã¾ãã
+
+ä»ã®è¨å®ã¯ãã¹ã¦ãä¸¦ååãªããã®å ´åã¨åãã§ãã
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+### ã¯ã©ã¹ã¿ä¸ã§ã®ãã¬ã¼ãã³ã°
+
+ã¯ã©ã¹ã¿ã¼è¨å®ãå¤æ´ãã¦ãä¸è¨ãã¬ã¼ãã³ã°ãã¬ã¼ã ã¯ã¼ã¯ã®æ¡å¼µãè¡ãã¾ãã
+
+    nworker_per_procs: 1
+
+ãã¹ã¦ã®ããã»ã¹ã¯ï¼ã¤ã®workerã¹ã¬ãããçæãã¾ãã
+çµæãworker éã¯ç°ãªãããã»ã¹ï¼ãã¼ãï¼åã§çæããã¾ãã
+ã¯ã©ã¹ã¿ã¼åã®ãã¼ããç¹å®ããã«ã¯ã*SINGA_ROOT/conf/* ã® *hostfile* ã®è¨å®ãå¿è¦ã§ãã
+
+e.g.,
+
+    logbase-a01
+    logbase-a02
+
+zookeeper location ãè¨å®ããå¿è¦ãããã¾ãã
+
+e.g.,
+
+    #conf/singa.conf
+    zookeeper_host: "logbase-a01"
+
+ã¹ã¯ãªããã®å®è¡ã¯ãSingle ãã¼ã ãã¬ã¼ãã³ã°ãã¨åãã§ãã
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+## Mesosãã§ã®å®è¡
+
+*working*...
+
+## æ¬¡ã¸
+
+SINGAã®ã³ã¼ãå¤æ´ãè¿½å ã«é¢ããè©³ç´°ã¯ã[ããã°ã©ãã³ã°ã¬ã¤ã](programming-guide.html) ããè¦§ãã ããã

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rbm.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rbm.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rbm.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rbm.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,365 @@
+# RBM Example
+
+---
+
+This example uses SINGA to train 4 RBM models and one auto-encoder model over the
+[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
+to reduce the dimensionality of the MNIST image feature. The RBM models are trained
+to initialize parameters of the auto-encoder model. This example application is
+from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).
+
+## Running instructions
+
+Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.
+
+The MNIST dataset has 70,000 handwritten digit images. The
+[data preparation](data.html) page
+has details on converting this dataset into SINGA recognizable format. Users can
+simply run the following commands to download and convert the dataset.
+
+    # at SINGA_ROOT/examples/mnist/
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+The training is separated into two phases, namely pre-training and fine-tuning.
+The pre-training phase trains 4 RBMs in sequence,
+
+    # at SINGA_ROOT/
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf
+
+The fine-tuning phase trains the auto-encoder by,
+
+    $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf
+
+
+## Training details
+
+### RBM1
+
+<img src="../images/example-rbm1.png" align="center" width="200px"/>
+<span><strong>Figure 1 - RBM1.</strong></span>
+
+The neural net structure for training RBM1 is shown in Figure 1.
+The data layer and parser layer provides features for training RBM1.
+The visible layer (connected with parser layer) of RBM1 accepts the image feature
+(784 dimension). The hidden layer is set to have 1000 neurons (units).
+These two layers are configured as,
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"mnist"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1"
+        init{
+          type: kGaussian
+          mean: 0.0
+          std: 0.1
+        }
+      }
+      param{
+        name: "b11"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1_"
+        share_from: "w1"
+      }
+      param{
+        name: "b12"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+
+
+For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
+`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
+the `share_from` field to enable [parameter sharing](param.html)
+as shown above for the param `w1` and `w1_`.
+
+[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
+is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
+Following Hinton's paper, we configure the [updating protocol](updater.html)
+as follows,
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.2
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.1
+        type: kFixed
+      }
+    }
+
+Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
+configure the `workspace` field to specify a path for the checkpoint folder.
+For example, if we configure it as,
+
+    cluster {
+      workspace: "examples/rbm/rbm1/"
+    }
+
+Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*.
+
+### RBM1
+<img src="../images/example-rbm2.png" align="center" width="200px"/>
+<span><strong>Figure 2 - RBM2.</strong></span>
+
+Figure 2 shows the net structure of training RBM2.
+The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
+is a  `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
+from RBM1.
+The neural net configuration is (with layers for data layer and parser layer omitted).
+
+    layer{
+      name: "Inner1"
+      type: kInnerProduct
+      srclayers:"mnist"
+      innerproduct_conf{
+        num_output: 1000
+      }
+      param{ name: "w1" }
+      param{ name: "b12"}
+    }
+
+    layer{
+      name: "Sigmoid1"
+      type: kSigmoid
+      srclayers:"Inner1"
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid1"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2"
+        ...
+      }
+      param{
+        name: "b21"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2_"
+        share_from: "w2"
+      }
+      param{
+        name: "b22"
+        ...
+      }
+    }
+
+To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,
+
+    checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
+    cluster{
+      workspace: "examples/rbm/rbm2"
+    }
+
+The workspace is changed for checkpointing `w2`, `b21` and `b22` into
+*examples/rbm/rbm2/*.
+
+### RBM3
+
+<img src="../images/example-rbm3.png" align="center" width="200px"/>
+<span><strong>Figure 3 - RBM3.</strong></span>
+
+Figure 3 shows the net structure of training RBM3. In this model, a layer with
+250 units is added as the hidden layer of RBM3. The visible units of RBM3
+accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
+`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
+i.e., "examples/rbm/rbm2/".
+
+### RBM4
+
+
+<img src="../images/example-rbm4.png" align="center" width="200px"/>
+<span><strong>Figure 4 - RBM4.</strong></span>
+
+Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
+but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
+top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
+Gaussian whose mean is determined by the input from the RBM's logistic visible
+units. So we add a `gaussian` field in the RBMHid layer to control the
+sampling distribution (Gaussian or Bernoulli). In addition, this
+RBM has a much smaller learning rate (0.001).  The neural net configuration for
+the RBM4 and the updating protocol is (with layers for data layer and parser
+layer omitted),
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.9
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.001
+        type: kFixed
+      }
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid3"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 30
+      }
+      param{
+        name: "w4"
+        ...
+      }
+      param{
+        name: "b41"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 30
+        gaussian: true
+      }
+      param{
+        name: "w4_"
+        share_from: "w4"
+      }
+      param{
+        name: "b42"
+        ...
+      }
+    }
+
+### Auto-encoder
+In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
+networks that are initialized using the parameters from the previous 4 RBMs.
+
+<img src="../images/example-autoencoder.png" align="center" width="500px"/>
+<span><strong>Figure 5 - Auto-Encoders.</strong></span>
+
+
+Figure 5 shows the neural net structure for training the auto-encoder.
+[Back propagation (kBP)] (train-one-batch.html) is
+configured as the algorithm for `TrainOneBatch`. We use the same cluster
+configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
+fixed learning rate.
+
+    ### Updater Configuration
+    updater{
+      type: kAdaGrad
+      learning_rate{
+      base_lr: 0.01
+      type: kFixed
+      }
+    }
+
+
+
+According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
+we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
+configuration is (with some of the middle layers omitted),
+
+    layer{ name: "data" }
+    layer{ name:"mnist" }
+    layer{
+      name: "Inner1"
+      param{ name: "w1" }
+      param{ name: "b12" }
+    }
+    layer{ name: "Sigmoid1" }
+    ...
+    layer{
+      name: "Inner8"
+      innerproduct_conf{
+        num_output: 784
+        transpose: true
+      }
+      param{
+        name: "w8"
+        share_from: "w1"
+      }
+      param{ name: "b11" }
+    }
+    layer{ name: "Sigmoid8" }
+
+    # Euclidean Loss Layer Configuration
+    layer{
+      name: "loss"
+      type:kEuclideanLoss
+      srclayers:"Sigmoid8"
+      srclayers:"mnist"
+    }
+
+To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as
+
+    ### Checkpoint Configuration
+    checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"
+
+
+## Visualization Results
+
+<div>
+<img src="../images/rbm-weight.PNG" align="center" width="300px"/>
+
+<img src="../images/rbm-feature.PNG" align="center" width="300px"/>
+<br/>
+<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
+&nbsp;
+&nbsp;
+&nbsp;
+&nbsp;
+
+<span><strong>Figure 7 - Top layer features.</strong></span>
+</div>
+
+Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
+Gabor-like filters are learned. Figure 7 depicts the features extracted from
+the top-layer of the auto-encoder, wherein one point represents one image.
+Different colors represent different digits. We can see that most images are
+well clustered according to the ground truth.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rnn.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rnn.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rnn.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,420 @@
+# Recurrent Neural Networks for Language Modelling
+
+---
+
+Recurrent Neural Networks (RNN) are widely used for modelling sequential data,
+such as music and sentences.  In this example, we use SINGA to train a
+[RNN model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf)
+proposed by Tomas Mikolov for [language modeling](https://en.wikipedia.org/wiki/Language_model).
+The training objective (loss) is
+to minimize the [perplexity per word](https://en.wikipedia.org/wiki/Perplexity), which
+is equivalent to maximize the probability of predicting the next word given the current word in
+a sentence.
+
+Different to the [CNN](cnn.html), [MLP](mlp.html)
+and [RBM](rbm.html) examples which use built-in
+layers(layer) and records(data),
+none of the layers in this example are built-in. Hence users would learn to
+implement their own layers and data records through this example.
+
+## Running instructions
+
+In *SINGA_ROOT/examples/rnnlm/*, scripts are provided to run the training job.
+First, the data is prepared by
+
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+Second, to compile the source code under *examples/rnnlm/*, run
+
+    $ make rnnlm
+
+An executable file *rnnlm.bin* will be generated.
+
+Third, the training is started by passing *rnnlm.bin* and the job configuration
+to *singa-run.sh*,
+
+    # at SINGA_ROOT/
+    # export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH
+    $ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf examples/rnnlm/job.conf
+
+## Implementations
+
+<img src="../images/rnnlm.png" align="center" width="400px"/>
+<span><strong>Figure 1 - Net structure of the RNN model.</strong></span>
+
+The neural net structure is shown Figure 1.  Word records are loaded by
+`DataLayer`. For every iteration, at most `max_window` word records are
+processed. If a sentence ending character is read, the `DataLayer` stops
+loading immediately. `EmbeddingLayer` looks up a word embedding matrix to extract
+feature vectors for words loaded by the `DataLayer`.  These features are transformed by the
+`HiddenLayer` which propagates the features from left to right. The
+output feature for word at position k is influenced by words from position 0 to
+k-1.  Finally, `LossLayer` computes the cross-entropy loss (see below)
+by predicting the next word of each word.
+The cross-entropy loss is computed as
+
+`$$L(w_t)=-log P(w_{t+1}|w_t)$$`
+
+Given `$w_t$` the above equation would compute over all words in the vocabulary,
+which is time consuming.
+[RNNLM Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz)
+accelerates the computation as
+
+`$$P(w_{t+1}|w_t) = P(C_{w_{t+1}}|w_t) * P(w_{t+1}|C_{w_{t+1}})$$`
+
+Words from the vocabulary are partitioned into a user-defined number of classes.
+The first term on the left side predicts the class of the next word, and
+then predicts the next word given its class. Both the number of classes and
+the words from one class are much smaller than the vocabulary size. The probabilities
+can be calculated much faster.
+
+The perplexity per word is computed by,
+
+`$$PPL = 10^{- avg_t log_{10} P(w_{t+1}|w_t)}$$`
+
+### Data preparation
+
+We use a small dataset provided by the [RNNLM Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz).
+It has 10,000 training sentences, with 71350 words in total and 3720 unique words.
+The subsequent steps follow the instructions in
+[Data Preparation](data.html) to convert the
+raw data into records and insert them into data stores.
+
+#### Download source data
+
+    # in SINGA_ROOT/examples/rnnlm/
+    cp Makefile.example Makefile
+    make download
+
+#### Define record format
+
+We define the word record as follows,
+
+    # in SINGA_ROOT/examples/rnnlm/rnnlm.proto
+    message WordRecord {
+      optional string word = 1;
+      optional int32 word_index = 2;
+      optional int32 class_index = 3;
+      optional int32 class_start = 4;
+      optional int32 class_end = 5;
+    }
+
+It includes the word string and its index in the vocabulary.
+Words in the vocabulary are sorted based on their frequency in the training dataset.
+The sorted list is cut into 100 sublists such that each sublist has 1/100 total
+word frequency. Each sublist is called a class.
+Hence each word has a `class_index` ([0,100)). The `class_start` is the index
+of the first word in the same class as `word`. The `class_end` is the index of
+the first word in the next class.
+
+#### Create data stores
+
+We use code from RNNLM Toolkit to read words, and sort them into classes.
+The main function in *create_store.cc* first creates word classes based on the training
+dataset. Second it calls the following function to create data store for the
+training, validation and test dataset.
+
+    int create_data(const char *input_file, const char *output_file);
+
+`input` is the path to training/validation/testing text file from the RNNLM Toolkit, `output` is output store file.
+This function starts with
+
+    singa::io::KVFile store;
+    store.Open(output, signa::io::kCreate);
+
+Then it reads the words one by one. For each word it creates a `WordRecord` instance,
+and inserts it into the store,
+
+    int wcnt = 0; // word count
+    WordRecord  wordRecord;
+    while(1) {
+      readWord(wordstr, fin);
+      if (feof(fin)) break;
+      ...// fill in the wordRecord;
+      string val;
+      wordRecord.SerializeToString(&val);
+      int length = snprintf(key, BUFFER_LEN, "%05d", wcnt++);
+      store.Write(string(key, length), val);
+    }
+
+Compilation and running commands are provided in the *Makefile.example*.
+After executing
+
+    make create
+
+*train_data.bin*, *test_data.bin* and *valid_data.bin* will be created.
+
+
+### Layer implementation
+
+4 user-defined layers are implemented for this application.
+Following the guide for implementing [new Layer subclasses](layer#implementing-a-new-layer-subclass),
+we extend the [LayerProto](../api/classsinga_1_1LayerProto.html)
+to include the configuration messages of user-defined layers as shown below
+(3 out of the 7 layers have specific configurations),
+
+
+    import "job.proto";     // Layer message for SINGA is defined
+
+    //For implementation of RNNLM application
+    extend singa.LayerProto {
+      optional EmbeddingProto embedding_conf = 101;
+      optional LossProto loss_conf = 102;
+      optional DataProto data_conf = 103;
+    }
+
+In the subsequent sections, we describe the implementation of each layer,
+including its configuration message.
+
+#### RNNLayer
+
+This is the base layer of all other layers for this applications. It is defined
+as follows,
+
+    class RNNLayer : virtual public Layer {
+    public:
+      inline int window() { return window_; }
+    protected:
+      int window_;
+    };
+
+For this application, two iterations may process different number of words.
+Because sentences have different lengths.
+The `DataLayer` decides the effective window size. All other layers call its source layers to get the
+effective window size and resets `window_` in `ComputeFeature` function.
+
+#### DataLayer
+
+DataLayer is for loading Records.
+
+    class DataLayer : public RNNLayer, singa::InputLayer {
+     public:
+      void Setup(const LayerProto& proto, const vector<Layer*>& srclayers) override;
+      void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
+      int max_window() const {
+        return max_window_;
+      }
+     private:
+      int max_window_;
+      singa::io::Store* store_;
+    };
+
+The Setup function gets the user configured max window size.
+
+    max_window_ = proto.GetExtension(input_conf).max_window();
+
+The `ComputeFeature` function loads at most max_window records. It could also
+stop when the sentence ending character is encountered.
+
+    ...// shift the last record to the first
+    window_ = max_window_;
+    for (int i = 1; i <= max_window_; i++) {
+      // load record; break if it is the ending character
+    }
+
+The configuration of `DataLayer` is like
+
+    name: "data"
+    user_type: "kData"
+    [data_conf] {
+      path: "examples/rnnlm/train_data.bin"
+      max_window: 10
+    }
+
+#### EmbeddingLayer
+
+This layer gets records from `DataLayer`. For each record, the word index is
+parsed and used to get the corresponding word feature vector from the embedding
+matrix.
+
+The class is declared as follows,
+
+    class EmbeddingLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{embed_};
+        return params;
+      }
+     private:
+      int word_dim_, vocab_size_;
+      Param* embed_;
+    }
+
+The `embed_` field is a matrix whose values are parameter to be learned.
+The matrix size is `vocab_size_` x `word_dim_`.
+
+The Setup function reads configurations for `word_dim_` and `vocab_size_`. Then
+it allocates feature Blob for `max_window` words and setups `embed_`.
+
+    int max_window = srclayers[0]->data(this).shape()[0];
+    word_dim_ = proto.GetExtension(embedding_conf).word_dim();
+    data_.Reshape(vector<int>{max_window, word_dim_});
+    ...
+    embed_->Setup(vector<int>{vocab_size_, word_dim_});
+
+The `ComputeFeature` function simply copies the feature vector from the `embed_`
+matrix into the feature Blob.
+
+    # reset effective window size
+    window_ = datalayer->window();
+    auto records = datalayer->records();
+    ...
+    for (int t = 0; t < window_; t++) {
+      int idx  <- word index
+      Copy(words[t], embed[idx]);
+    }
+
+The `ComputeGradient` function copies back the gradients to the `embed_` matrix.
+
+The configuration for `EmbeddingLayer` is like,
+
+    user_type: "kEmbedding"
+    [embedding_conf] {
+      word_dim: 15
+      vocab_size: 3720
+    }
+    srclayers: "data"
+    param {
+      name: "w1"
+      init {
+        type: kUniform
+        low:-0.3
+        high:0.3
+      }
+    }
+
+#### HiddenLayer
+
+This layer unrolls the recurrent connections for at most max_window times.
+The feature for position k is computed based on the feature from the embedding layer (position k)
+and the feature at position k-1 of this layer. The formula is
+
+`$$f[k]=\sigma (f[t-1]*W+src[t])$$`
+
+where `$W$` is a matrix with `word_dim_` x `word_dim_` parameters.
+
+If you want to implement a recurrent neural network following our
+design, this layer is of vital importance for you to refer to.
+
+    class HiddenLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{weight_};
+        return params;
+      }
+    private:
+      Param* weight_;
+    };
+
+The `Setup` function setups the weight matrix as
+
+    weight_->Setup(std::vector<int>{word_dim, word_dim});
+
+The `ComputeFeature` function gets the effective window size (`window_`) from its source layer
+i.e., the embedding layer. Then it propagates the feature from position 0 to position
+`window_` -1. The detailed descriptions for this process are illustrated as follows.
+
+    void HiddenLayer::ComputeFeature() {
+      for(int t = 0; t < window_size; t++){
+        if(t == 0)
+          Copy(data[t], src[t]);
+        else
+          data[t]=sigmoid(data[t-1]*W + src[t]);
+      }
+    }
+
+The `ComputeGradient` function computes the gradient of the loss w.r.t. W and the source layer.
+Particularly, for each position k, since data[k] contributes to data[k+1] and the feature
+at position k in its destination layer (the loss layer), grad[k] should contains the gradient
+from two parts. The destination layer has already computed the gradient from the loss layer into
+grad[k]; In the `ComputeGradient` function, we need to add the gradient from position k+1.
+
+    void HiddenLayer::ComputeGradient(){
+      ...
+      for (int k = window_ - 1; k >= 0; k--) {
+        if (k < window_ - 1) {
+          grad[k] += dot(grad[k + 1], weight.T()); // add gradient from position t+1.
+        }
+        grad[k] =... // compute gL/gy[t], y[t]=data[t-1]*W+src[t]
+      }
+      gweight = dot(data.Slice(0, window_-1).T(), grad.Slice(1, window_));
+      Copy(gsrc, grad);
+    }
+
+After the loop, we get the gradient of the loss w.r.t y[k], which is used to
+compute the gradient of W and the src[k].
+
+#### LossLayer
+
+This layer computes the cross-entropy loss and the `$log_{10}P(w_{t+1}|w_t)$` (which
+could be averaged over all words by users to get the PPL value).
+
+There are two configuration fields to be specified by users.
+
+    message LossProto {
+      optional int32 nclass = 1;
+      optional int32 vocab_size = 2;
+    }
+
+There are two weight matrices to be learned
+
+    class LossLayer : public RNNLayer {
+      ...
+     private:
+      Param* word_weight_, *class_weight_;
+    }
+
+The ComputeFeature function computes the two probabilities respectively.
+
+`$$P(C_{w_{t+1}}|w_t) = Softmax(w_t * class\_weight_)$$`
+`$$P(w_{t+1}|C_{w_{t+1}}) = Softmax(w_t * word\_weight[class\_start:class\_end])$$`
+
+`$w_t$` is the feature from the hidden layer for the k-th word, its ground truth
+next word is `$w_{t+1}$`.  The first equation computes the probability distribution over all
+classes for the next word. The second equation computes the
+probability distribution over the words in the ground truth class for the next word.
+
+The ComputeGradient function computes the gradient of the source layer
+(i.e., the hidden layer) and the two weight matrices.
+
+### Updater Configuration
+
+We employ kFixedStep type of the learning rate change method and the
+configuration is as follows. We decay the learning rate once the performance does
+not increase on the validation dataset.
+
+    updater{
+      type: kSGD
+      learning_rate {
+        type: kFixedStep
+        fixedstep_conf:{
+          step:0
+          step:48810
+          step:56945
+          step:65080
+          step:73215
+          step_lr:0.1
+          step_lr:0.05
+          step_lr:0.025
+          step_lr:0.0125
+          step_lr:0.00625
+        }
+      }
+    }
+
+### TrainOneBatch() Function
+
+We use BP (BackPropagation) algorithm to train the RNN model here. The
+corresponding configuration can be seen below.
+
+    # In job.conf file
+    train_one_batch {
+      alg: kBackPropagation
+    }
+
+### Cluster Configuration
+
+The default cluster configuration can be used, i.e., single worker and single server
+in a single process.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/test.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/test.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/test.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/test.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,119 @@
+# Performance Test and Feature Extraction
+
+----
+
+Once SINGA finishes the training of a model, it would checkpoint the model parameters
+into disk files under the [checkpoint folder](checkpoint.html). Model parameters can also be dumped
+into this folder periodically during training if the
+[checkpoint configuration[(checkpoint.html) fields are set. With the checkpoint
+files, we can load the model parameters to conduct performance test, feature extraction and prediction
+against new data.
+
+To load the model parameters from checkpoint files, we need to add the paths of
+checkpoint files in the job configuration file
+
+    checkpoint_path: PATH_TO_CHECKPOINT_FILE1
+    checkpoint_path: PATH_TO_CHECKPOINT_FILE2
+    ...
+
+The new dataset is configured by specifying the ``test_step`` and the data input
+layer, e.g. the following configuration is for a dataset with 100*100 instances.
+
+    test_steps: 100
+    net {
+      layer {
+        name: "input"
+        store_conf {
+          backend: "kvfile"
+          path: PATH_TO_TEST_KVFILE
+          batchsize: 100
+        }
+      }
+      ...
+    }
+
+## Performance Test
+
+This application is to test the performance, e.g., accuracy, of the previously
+trained model. Depending on the application, the test data may have ground truth
+labels or not. For example, if the model is trained for image classification,
+the test images must have ground truth labels to calculate the accuracy; if the
+model is an auto-encoder, the performance could be measured by reconstruction error, which
+does not require extra labels. For both cases, there would be a layer that calculates
+the performance, e.g., the `SoftmaxLossLayer`.
+
+The job configuration file for the cifar10 example can be used directly for testing after
+adding the checkpoint path. The running command is
+
+
+    $ ./bin/singa-run.sh -conf examples/cifar10/job.conf -test
+
+The performance would be output on the screen like,
+
+
+    Load from checkpoint file examples/cifar10/checkpoint/step50000-worker0
+    accuracy = 0.728000, loss = 0.807645
+
+## Feature extraction
+
+Since deep learning models are good at learning features, feature extraction for
+is a major functionality of deep learning models, e.g., we can extract features
+from the fully connected layers of [AlexNet](www.cs.toronto.edu/~fritz/absps/imagenet.pdf) as image features for image retrieval.
+To extract the features from one layer, we simply add an output layer after that layer.
+For instance, to extract the fully connected (with name `ip1`) layer of the cifar10 example model,
+we replace the `SoftmaxLossLayer` with a `CSVOutputLayer` which extracts the features into a CSV file,
+
+    layer {
+      name: "ip1"
+    }
+    layer {
+      name: "output"
+      type: kCSVOutput
+      srclayers: "ip1"
+      store_conf {
+        backend: "textfile"
+        path: OUTPUT_FILE_PATH
+      }
+    }
+
+The input layer and test steps, and the running command are the same as in *Performance Test* section.
+
+## Label Prediction
+
+If the output layer is connected to a layer that predicts labels of images,
+the output layer would then write the prediction results into files.
+SINGA provides two built-in layers for generating prediction results, namely,
+
+* SoftmaxLayer, generates probabilities of each candidate labels.
+* ArgSortLayer, sorts labels according to probabilities in descending order and keep topk labels.
+
+By connecting the two layers with the previous layer and the output layer, we can
+extract the predictions of each instance. For example,
+
+    layer {
+      name: "feature"
+      ...
+    }
+    layer {
+      name: "softmax"
+      type: kSoftmax
+      srclayers: "feature"
+    }
+    layer {
+      name: "prediction"
+      type: kArgSort
+      srclayers: "softmax"
+      argsort_conf {
+        topk: 5
+      }
+    }
+    layer {
+      name: "output"
+      type: kCSVOutput
+      srclayers: "prediction"
+      store_conf {}
+    }
+
+The top-5 labels of each instance will be written as one line of the output CSV file.
+Currently, above layers cannot co-exist with the loss layers used for training.
+Please comment out the loss layers for extracting prediction results.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/train-one-batch.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/train-one-batch.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/train-one-batch.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/train-one-batch.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,179 @@
+# Train-One-Batch
+
+---
+
+For each SGD iteration, every worker calls the `TrainOneBatch` function to
+compute gradients of parameters associated with local layers (i.e., layers
+dispatched to it).  SINGA has implemented two algorithms for the
+`TrainOneBatch` function. Users select the corresponding algorithm for
+their model in the configuration.
+
+## Basic user guide
+
+### Back-propagation
+
+[BP algorithm](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) is used for
+computing gradients of feed-forward models, e.g., [CNN](cnn.html)
+and [MLP](mlp.html), and [RNN](rnn.html) models in SINGA.
+
+
+    # in job.conf
+    alg: kBP
+
+To use the BP algorithm for the `TrainOneBatch` function, users just simply
+configure the `alg` field with `kBP`. If a neural net contains user-defined
+layers, these layers must be implemented properly be to consistent with the
+implementation of the BP algorithm in SINGA (see below).
+
+
+### Contrastive Divergence
+
+[CD algorithm](http://www.cs.toronto.edu/~fritz/absps/nccd.pdf) is used for
+computing gradients of energy models like RBM.
+
+    # job.conf
+    alg: kCD
+    cd_conf {
+      cd_k: 2
+    }
+
+To use the CD algorithm for the `TrainOneBatch` function, users just configure
+the `alg` field to `kCD`. Uses can also configure the Gibbs sampling steps in
+the CD algorthm through the `cd_k` field. By default, it is set to 1.
+
+
+
+## Advanced user guide
+
+### Implementation of BP
+
+The BP algorithm is implemented in SINGA following the below pseudo code,
+
+    BPTrainOnebatch(step, net) {
+      // forward propagate
+      foreach layer in net.local_layers() {
+        if IsBridgeDstLayer(layer)
+          recv data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param) // recv response from servers for last update
+
+        layer.ComputeFeature(kForward)
+
+        if IsBridgeSrcLayer(layer)
+          send layer.data_ to dst layer
+      }
+      // backward propagate
+      foreach layer in reverse(net.local_layers) {
+        if IsBridgeSrcLayer(layer)
+          recv gradient from the dst layer (i.e., BridgeDstLayer)
+          recv response from servers for last update
+
+        layer.ComputeGradient()
+        foreach param in layer.params()
+          Update(step, param) // send param.grad_ to servers
+
+        if IsBridgeDstLayer(layer)
+          send layer.grad_ to src layer
+      }
+    }
+
+
+It forwards features through all local layers (can be checked by layer
+partition ID and worker ID) and backwards gradients in the reverse order.
+[BridgeSrcLayer](layer.html#bridgesrclayer--bridgedstlayer)
+(resp. `BridgeDstLayer`) will be blocked until the feature (resp.
+gradient) from the source (resp. destination) layer comes. Parameter gradients
+are sent to servers via `Update` function. Updated parameters are collected via
+`Collect` function, which will be blocked until the parameter is updated.
+[Param](param.html) objects have versions, which can be used to
+check whether the `Param` objects have been updated or not.
+
+Since RNN models are unrolled into feed-forward models, users need to implement
+the forward propagation in the recurrent layer's `ComputeFeature` function,
+and implement the backward propagation in the recurrent layer's `ComputeGradient`
+function. As a result, the whole `TrainOneBatch` runs
+[back-propagation through time (BPTT)](https://en.wikipedia.org/wiki/Backpropagation_through_time)  algorithm.
+
+### Implementation of CD
+
+The CD algorithm is implemented in SINGA following the below pseudo code,
+
+    CDTrainOneBatch(step, net) {
+      # positive phase
+      foreach layer in net.local_layers()
+        if IsBridgeDstLayer(layer)
+          recv positive phase data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param)  // recv response from servers for last update
+        layer.ComputeFeature(kPositive)
+        if IsBridgeSrcLayer(layer)
+          send positive phase data to dst layer
+
+      # negative phase
+      foreach gibbs in [0...layer_proto_.cd_k]
+        foreach layer in net.local_layers()
+          if IsBridgeDstLayer(layer)
+            recv negative phase data from the src layer (i.e., BridgeSrcLayer)
+          layer.ComputeFeature(kPositive)
+          if IsBridgeSrcLayer(layer)
+            send negative phase data to dst layer
+
+      foreach layer in net.local_layers()
+        layer.ComputeGradient()
+        foreach param in layer.params
+          Update(param)
+    }
+
+Parameter gradients are computed after the positive phase and negative phase.
+
+### Implementing a new algorithm
+
+SINGA implements BP and CD by creating two subclasses of
+the [Worker](../api/classsinga_1_1Worker.html) class:
+[BPWorker](../api/classsinga_1_1BPWorker.html)'s `TrainOneBatch` function implements the BP
+algorithm; [CDWorker](../api/classsinga_1_1CDWorker.html)'s `TrainOneBatch` function implements the CD
+algorithm. To implement a new algorithm for the `TrainOneBatch` function, users
+need to create a new subclass of the `Worker`, e.g.,
+
+    class FooWorker : public Worker {
+      void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) override;
+      void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, Metric* perf) override;
+    };
+
+The `FooWorker` must implement the above two functions for training one
+mini-batch and testing one mini-batch. The `perf` argument is for collecting
+training or testing performance, e.g., the objective loss or accuracy. It is
+passed to the `ComputeFeature` function of each layer.
+
+Users can define some fields for users to configure
+
+    # in user.proto
+    message FooWorkerProto {
+      optional int32 b = 1;
+    }
+
+    extend JobProto {
+      optional FooWorkerProto foo_conf = 101;
+    }
+
+    # in job.proto
+    JobProto {
+      ...
+      extension 101..max;
+    }
+
+It is similar as [adding configuration fields for a new layer](layer.html#implementing-a-new-layer-subclass).
+
+To use `FooWorker`, users need to register it in the [main.cc](programming-guide.html)
+and configure the `alg` and `foo_conf` fields,
+
+    # in main.cc
+    const int kFoo = 3; // worker ID, must be different to that of CDWorker and BPWorker
+    driver.RegisterWorker<FooWorker>(kFoo);
+
+    # in job.conf
+    ...
+    alg: 3
+    [foo_conf] {
+      b = 4;
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/jp/updater.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/jp/updater.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/jp/updater.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/jp/updater.md Wed Apr 20 05:09:06 2016
@@ -0,0 +1,284 @@
+# Updater
+
+---
+
+Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html)
+instance that updates parameters based on gradients.
+In this page, the *Basic user guide* describes the configuration of an updater.
+The *Advanced user guide* present details on how to implement a new updater and a new
+learning rate changing method.
+
+## Basic user guide
+
+There are many different parameter updating protocols (i.e., subclasses of
+`Updater`). They share some configuration fields like
+
+* `type`, an integer for identifying an updater;
+* `learning_rate`, configuration for the
+[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the learning rate.
+* `weight_decay`, the co-efficient for [L2 * regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization).
+* [momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/).
+
+If you are not familiar with the above terms, you can get their meanings in
+[this page provided by Karpathy](http://cs231n.github.io/neural-networks-3/#update).
+
+### Configuration of built-in updater classes
+
+#### Updater
+The base `Updater` implements the [vanilla SGD algorithm](http://cs231n.github.io/neural-networks-3/#sgd).
+Its configuration type is `kSGD`.
+Users need to configure at least the `learning_rate` field.
+`momentum` and `weight_decay` are optional fields.
+
+    updater{
+      type: kSGD
+      momentum: float
+      weight_decay: float
+      learning_rate {
+        ...
+      }
+    }
+
+#### AdaGradUpdater
+
+It inherits the base `Updater` to implement the
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm.
+Its type is `kAdaGrad`.
+`AdaGradUpdater` is configured similar to `Updater` except
+that `momentum` is not used.
+
+#### NesterovUpdater
+
+It inherits the base `Updater` to implements the
+[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating protocol.
+Its type is `kNesterov`.
+`learning_rate` and `momentum` must be configured. `weight_decay` is an
+optional configuration field.
+
+#### RMSPropUpdater
+
+It inherits the base `Updater` to implements the
+[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by
+[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide 29).
+Its type is `kRMSProp`.
+
+    updater {
+      type: kRMSProp
+      rmsprop_conf {
+       rho: float # [0,1]
+      }
+    }
+
+
+### Configuration of learning rate
+
+The `learning_rate` field is configured as,
+
+    learning_rate {
+      type: ChangeMethod
+      base_lr: float  # base/initial learning rate
+      ... # fields to a specific changing method
+    }
+
+The common fields include `type` and `base_lr`. SINGA provides the following
+`ChangeMethod`s.
+
+#### kFixed
+
+The `base_lr` is used for all steps.
+
+#### kLinear
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr:  float
+      linear_conf {
+        freq: int
+        final_lr: float
+      }
+    }
+
+Linear interpolation is used to change the learning rate,
+
+    lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
+
+#### kExponential
+
+The udapter should be configured like
+
+    learning_rate {
+      base_lr: float
+      exponential_conf {
+        freq: int
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / 2^(step / freq)
+
+#### kInverseT
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverset_conf {
+        final_lr: float
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / (1 + step / final_lr)
+
+#### kInverse
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverse_conf {
+        gamma: float
+        pow: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * (1 + gamma * setp)^(-pow)
+
+
+#### kStep
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr : float
+      step_conf {
+        change_freq: int
+        gamma: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * gamma^ (step / change_freq)
+
+#### kFixedStep
+
+The updater should be configured like
+
+    learning_rate {
+      fixedstep_conf {
+        step: int
+        step_lr: float
+
+        step: int
+        step_lr: float
+
+        ...
+      }
+    }
+
+Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
+`step` is,
+
+    step_lr[k]
+
+where step[k] is the smallest number that is larger than `step`.
+
+
+## Advanced user guide
+
+### Implementing a new Updater subclass
+
+The base Updater class has one virtual function,
+
+    class Updater{
+     public:
+      virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
+
+     protected:
+      UpdaterProto proto_;
+      LRGenerator lr_gen_;
+    };
+
+It updates the values of the `param` based on its gradients. The `step`
+argument is for deciding the learning rate which may change through time
+(step). `grad_scale` scales the original gradient values. This function is
+called by servers once it receives all gradients for the same `Param` object.
+
+To implement a new Updater subclass, users must override the `Update` function.
+
+    class FooUpdater : public Updater {
+      void Update(int step, Param* param, float grad_scale = 1.0f) override;
+    };
+
+Configuration of this new updater can be declared similar to that of a new
+layer,
+
+    # in user.proto
+    FooUpdaterProto {
+      optional int32 c = 1;
+    }
+
+    extend UpdaterProto {
+      optional FooUpdaterProto fooupdater_conf= 101;
+    }
+
+The new updater should be registered in the
+[main function](programming-guide.html)
+
+    driver.RegisterUpdater<FooUpdater>("FooUpdater");
+
+Users can then configure the job as
+
+    # in job.conf
+    updater {
+      user_type: "FooUpdater"  # must use user_type with the same string identifier as the one used for registration
+      fooupdater_conf {
+        c : 20;
+      }
+    }
+
+### Implementing a new LRGenerator subclass
+
+The base `LRGenerator` is declared as,
+
+    virtual float Get(int step);
+
+To implement a subclass, e.g., `FooLRGen`, users should declare it like
+
+    class FooLRGen : public LRGenerator {
+     public:
+      float Get(int step) override;
+    };
+
+Configuration of `FooLRGen` can be defined using a protocol message,
+
+    # in user.proto
+    message FooLRProto {
+     ...
+    }
+
+    extend LRGenProto {
+      optional FooLRProto foolr_conf = 101;
+    }
+
+The configuration is then like,
+
+    learning_rate {
+      user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration
+      base_lr: float
+      foolr_conf {
+        ...
+      }
+    }
+
+Users have to register this subclass in the main function,
+
+      driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")