You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2016/01/13 04:46:20 UTC
svn commit: r1724348 [5/6] - in
/incubator/singa/site/trunk/content/markdown/docs: ./ jp/ kr/
Added: incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,294 @@
+# Model Configuration
+
+---
+
+SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters
+of deep learning models. For each SGD iteration, there is a
+[Worker](architecture.html) computing
+gradients of parameters from the NeuralNet and a [Updater]() updating parameter
+values based on gradients. Hence the model configuration mainly consists these
+three parts. We will introduce the NeuralNet, Worker and Updater in the
+following paragraphs and describe the configurations for them. All model
+configuration is specified in the model.conf file in the user provided
+workspace folder. E.g., the [cifar10 example folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10)
+has a model.conf file.
+
+
+## NeuralNet
+
+### Uniform model (neuralnet) representation
+
+<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1:
+Deep learning model categorization</img>
+
+Many deep learning models have being proposed. Fig. 1 is a categorization of
+popular deep learning models based on the layer connections. The
+[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h)
+abstraction of SINGA consists of multiple directly connected layers. This
+abstraction is able to represent models from all the three categorizations.
+
+ * For the feed-forward models, their connections are already directed.
+
+ * For the RNN models, we unroll them into directed connections, as shown in
+ Fig. 2.
+
+ * For the undirected connections in RBM, DBM, etc., we replace each undirected
+ connection with two directed connection, as shown in Fig. 3.
+
+<div style = "height: 200px">
+<div style = "float:left; text-align: center">
+<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img>
+</div>
+<div style = "float:left; text-align: center; margin-left: 40px">
+<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img>
+</div>
+</div>
+
+In specific, the NeuralNet class is defined in
+[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) :
+
+ ...
+ vector<Layer*> layers_;
+ ...
+
+The Layer class is defined in
+[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h):
+
+ vector<Layer*> srclayers_, dstlayers_;
+ LayerProto layer_proto_; // layer configuration, including meta info, e.g., name
+ ...
+
+
+The connection with other layers are kept in the `srclayers_` and `dstlayers_`.
+Since there are many different feature transformations, there are many
+different Layer implementations correspondingly. For layers that have
+parameters in their feature transformation functions, they would have Param
+instances in the layer class, e.g.,
+
+ Param weight;
+
+
+### Configure the structure of a NeuralNet instance
+
+To train a deep learning model, the first step is to write the configurations
+for the model structure, i.e., the layers and connections for the NeuralNet.
+Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol
+Buffer](https://developers.google.com/protocol-buffers/) to define the
+configuration protocol. The
+[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto)
+specifies the configuration fields for a NeuralNet instance,
+
+message NetProto {
+ repeated LayerProto layer = 1;
+ ...
+}
+
+The configuration is then
+
+ layer {
+ // layer configuration
+ }
+ layer {
+ // layer configuration
+ }
+ ...
+
+To configure the model structure, we just configure each layer involved in the model.
+
+ message LayerProto {
+ // the layer name used for identification
+ required string name = 1;
+ // source layer names
+ repeated string srclayers = 3;
+ // parameters, e.g., weight matrix or bias vector
+ repeated ParamProto param = 12;
+ // the layer type from the enum above
+ required LayerType type = 20;
+ // configuration for convolution layer
+ optional ConvolutionProto convolution_conf = 30;
+ // configuration for concatenation layer
+ optional ConcateProto concate_conf = 31;
+ // configuration for dropout layer
+ optional DropoutProto dropout_conf = 33;
+ ...
+ }
+
+A sample configuration for a feed-forward model is like
+
+ layer {
+ name : "input"
+ type : kRecordInput
+ }
+ layer {
+ name : "conv"
+ type : kInnerProduct
+ srclayers : "input"
+ param {
+ // configuration for parameter
+ }
+ innerproduct_conf {
+ // configuration for this specific layer
+ }
+ ...
+ }
+
+The layer type list is defined in
+[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto).
+One type (kFoo) corresponds to one child class of Layer (FooLayer) and one
+configuration field (foo_conf). All built-in layers are introduced in the [layer page](layer.html).
+
+## Worker
+
+At the beginning, the Work will initialize the values of Param instances of
+each layer either randomly (according to user configured distribution) or
+loading from a [checkpoint file](). For each training iteration, the worker
+visits layers of the neural network to compute gradients of Param instances of
+each layer. Corresponding to the three categories of models, there are three
+different algorithm to compute the gradients of a neural network.
+
+ 1. Back-propagation (BP) for feed-forward models
+ 2. Back-propagation through time (BPTT) for recurrent neural networks
+ 3. Contrastive divergence (CD) for RBM, DBM, etc models.
+
+SINGA has provided these three algorithms as three Worker implementations.
+Users only need to configure in the model.conf file to specify which algorithm
+should be used. The configuration protocol is
+
+ message ModelProto {
+ ...
+ enum GradCalcAlg {
+ // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN
+ kBP = 1;
+ // BPTT for recurrent neural networks
+ kBPTT = 2;
+ // CD algorithm for RBM, DBM etc., models
+ kCd = 3;
+ }
+ // gradient calculation algorithm
+ required GradCalcAlg alg = 8 [default = kBackPropagation];
+ ...
+ }
+
+These algorithms override the TrainOneBatch function of the Worker. E.g., the
+BPWorker implements it as
+
+ void BPWorker::TrainOneBatch(int step, Metric* perf) {
+ Forward(step, kTrain, train_net_, perf);
+ Backward(step, train_net_);
+ }
+
+The Forward function passes the raw input features of one mini-batch through
+all layers, and the Backward function visits the layers in reverse order to
+compute the gradients of the loss w.r.t each layer's feature and each layer's
+Param objects. Different algorithms would visit the layers in different orders.
+Some may traverses the neural network multiple times, e.g., the CDWorker's
+TrainOneBatch function is:
+
+ void CDWorker::TrainOneBatch(int step, Metric* perf) {
+ PostivePhase(step, kTrain, train_net_, perf);
+ NegativePhase(step, kTran, train_net_, perf);
+ GradientPhase(step, train_net_);
+ }
+
+Each `*Phase` function would visit all layers one or multiple times.
+All algorithms will finally call two functions of the Layer class:
+
+ /**
+ * Transform features from connected layers into features of this layer.
+ *
+ * @param phase kTrain, kTest, kPositive, etc.
+ */
+ virtual void ComputeFeature(Phase phase, Metric* perf) = 0;
+ /**
+ * Compute gradients for parameters (and connected layers).
+ *
+ * @param phase kTrain, kTest, kPositive, etc.
+ */
+ virtual void ComputeGradient(Phase phase) = 0;
+
+All [Layer implementations]() must implement the above two functions.
+
+
+## Updater
+
+Once the gradients of parameters are computed, the Updater will update
+parameter values. There are many SGD variants for updating parameters, like
+[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf),
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf),
+[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf),
+[Nesterov](http://scholar.google.com/citations?view_op=view_citation&hl=en&user=DJ8Ep8YAAAAJ&citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C)
+and SGD with momentum. The core functions of the Updater is
+
+ /**
+ * Update parameter values based on gradients
+ * @param step training step
+ * @param param pointer to the Param object
+ * @param grad_scale scaling factor for the gradients
+ */
+ void Update(int step, Param* param, float grad_scale=1.0f);
+ /**
+ * @param step training step
+ * @return the learning rate for this step
+ */
+ float GetLearningRate(int step);
+
+SINGA provides several built-in updaters and learning rate change methods.
+Users can configure them according to the UpdaterProto
+
+ message UpdaterProto {
+ enum UpdaterType{
+ // noraml SGD with momentum and weight decay
+ kSGD = 1;
+ // adaptive subgradient, http://www.magicbroom.info/Papers/DuchiHaSi10.pdf
+ kAdaGrad = 2;
+ // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
+ kRMSProp = 3;
+ // Nesterov first optimal gradient method
+ kNesterov = 4;
+ }
+ // updater type
+ required UpdaterType type = 1 [default=kSGD];
+ // configuration for RMSProp algorithm
+ optional RMSPropProto rmsprop_conf = 50;
+
+ enum ChangeMethod {
+ kFixed = 0;
+ kInverseT = 1;
+ kInverse = 2;
+ kExponential = 3;
+ kLinear = 4;
+ kStep = 5;
+ kFixedStep = 6;
+ }
+ // change method for learning rate
+ required ChangeMethod lr_change= 2 [default = kFixed];
+
+ optional FixedStepProto fixedstep_conf=40;
+ ...
+ optional float momentum = 31 [default = 0];
+ optional float weight_decay = 32 [default = 0];
+ // base learning rate
+ optional float base_lr = 34 [default = 0];
+ }
+
+
+## Other model configuration fields
+
+Some other important configuration fields for training a deep learning model is
+listed:
+
+ // model name, e.g., "cifar10-dcnn", "mnist-mlp"
+ string name;
+ // displaying training info for every this number of iterations, default is 0
+ int32 display_freq;
+ // total num of steps/iterations for training
+ int32 train_steps;
+ // do test for every this number of training iterations, default is 0
+ int32 test_freq;
+ // run test for this number of steps/iterations, default is 0.
+ // The test dataset has test_steps * batchsize instances.
+ int32 test_steps;
+ // do checkpoint for every this number of training steps, default is 0
+ int32 checkpoint_freq;
+
+The pages of [checkpoint and restore](checkpoint.html) has details on checkpoint related fields.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,327 @@
+# Neural Net
+
+---
+
+`NeuralNet` in SINGA represents an instance of user's neural net model. As the
+neural net typically consists of a set of layers, `NeuralNet` comprises
+a set of unidirectionally connected [Layer](layer.html)s.
+This page describes how to convert an user's neural net into
+the configuration of `NeuralNet`.
+
+<img src="../images/model-category.png" align="center" width="200px"/>
+<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span>
+
+## Net structure configuration
+
+Users configure the `NeuralNet` by listing all layers of the neural net and
+specifying each layer's source layer names. Popular deep learning models can be
+categorized as Figure 1. The subsequent sections give details for each
+category.
+
+### Feed-forward models
+
+<div align = "left">
+<img src="../images/mlp-net.png" align="center" width="200px"/>
+<span><strong>Figure 2 - Net structure of a MLP model.</strong></span>
+</div>
+
+Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer
+connections are undirected without circles. The
+configuration for the MLP model shown in Figure 1 is as follows,
+
+ net {
+ layer {
+ name : 'data"
+ type : kData
+ }
+ layer {
+ name : 'image"
+ type : kImage
+ srclayer: 'data'
+ }
+ layer {
+ name : 'label"
+ type : kLabel
+ srclayer: 'data'
+ }
+ layer {
+ name : 'hidden"
+ type : kHidden
+ srclayer: 'image'
+ }
+ layer {
+ name : 'softmax"
+ type : kSoftmaxLoss
+ srclayer: 'hidden'
+ srclayer: 'label'
+ }
+ }
+
+### Energy models
+
+<img src="../images/rbm-rnn.png" align="center" width="500px"/>
+<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span>
+
+
+For energy models including RBM, DBM,
+etc., their connections are undirected (i.e., Category B). To represent these models using
+`NeuralNet`, users can simply replace each connection with two directed
+connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source
+layer field should include each other's name.
+The full [RBM example](rbm.html) has
+detailed neural net configuration for a RBM model, which looks like
+
+ net {
+ layer {
+ name : "vis"
+ type : kVisLayer
+ param {
+ name : "w1"
+ }
+ srclayer: "hid"
+ }
+ layer {
+ name : "hid"
+ type : kHidLayer
+ param {
+ name : "w2"
+ share_from: "w1"
+ }
+ srclayer: "vis"
+ }
+ }
+
+### RNN models
+
+For recurrent neural networks (RNN), users can remove the recurrent connections
+by unrolling the recurrent layer. For example, in Figure 3b, the original
+layer is unrolled into a new layer with 4 internal layers. In this way, the
+model is like a normal feed-forward model, thus can be configured similarly.
+The [RNN example](rnn.html) has a full neural net
+configuration for a RNN model.
+
+
+## Configuration for multiple nets
+
+Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc.. To avoid
+redundant configurations for the shared layers, users can uses the `exclude`
+filed to filter a layer in the neural net, e.g., the following layer will be
+filtered when creating the testing `NeuralNet`.
+
+
+ layer {
+ ...
+ exclude : kTest # filter this layer for creating test net
+ }
+
+
+
+## Neural net partitioning
+
+A neural net can be partitioned in different ways to distribute the training
+over multiple workers.
+
+### Batch and feature dimension
+
+<img src="../images/partition_fc.png" align="center" width="400px"/>
+<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span>
+
+
+Every layer's feature blob is considered a matrix whose rows are feature
+vectors. Thus, one layer can be split on two dimensions. Partitioning on
+dimension 0 (also called batch dimension) slices the feature matrix by rows.
+For instance, if the mini-batch size is 256 and the layer is partitioned into 2
+sub-layers, each sub-layer would have 128 feature vectors in its feature blob.
+Partitioning on this dimension has no effect on the parameters, as every
+[Param](param.html) object is replicated in the sub-layers. Partitioning on dimension
+1 (also called feature dimension) slices the feature matrix by columns. For
+example, suppose the original feature vector has 50 units, after partitioning
+into 2 sub-layers, each sub-layer would have 25 units. This partitioning may
+result in [Param](param.html) object being split, as shown in
+Figure 4. Both the bias vector and weight matrix are
+partitioned into two sub-layers.
+
+
+### Partitioning configuration
+
+There are 4 partitioning schemes, whose configurations are give below,
+
+ 1. Partitioning each singe layer into sub-layers on batch dimension (see
+ below). It is enabled by configuring the partition dimension of the layer to
+ 0, e.g.,
+
+ # with other fields omitted
+ layer {
+ partition_dim: 0
+ }
+
+ 2. Partitioning each singe layer into sub-layers on feature dimension (see
+ below). It is enabled by configuring the partition dimension of the layer to
+ 1, e.g.,
+
+ # with other fields omitted
+ layer {
+ partition_dim: 1
+ }
+
+ 3. Partitioning all layers into different subsets. It is enabled by
+ configuring the location ID of a layer, e.g.,
+
+ # with other fields omitted
+ layer {
+ location: 1
+ }
+ layer {
+ location: 0
+ }
+
+
+ 4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is
+ useful for large models. An example application is to implement the
+ [idea proposed by Alex](http://arxiv.org/abs/1404.5997).
+ Hybrid partitioning is configured like,
+
+ # with other fields omitted
+ layer {
+ location: 1
+ }
+ layer {
+ location: 0
+ }
+ layer {
+ partition_dim: 0
+ location: 0
+ }
+ layer {
+ partition_dim: 1
+ location: 0
+ }
+
+Currently SINGA supports strategy-2 well. Other partitioning strategies are
+are under test and will be released in later version.
+
+## Parameter sharing
+
+Parameters can be shared in two cases,
+
+ * sharing parameters among layers via user configuration. For example, the
+ visible layer and hidden layer of a RBM shares the weight matrix, which is configured through
+ the `share_from` field as shown in the above RBM configuration. The
+ configurations must be the same (except name) for shared parameters.
+
+ * due to neural net partitioning, some `Param` objects are replicated into
+ different workers, e.g., partitioning one layer on batch dimension. These
+ workers share parameter values. SINGA controls this kind of parameter
+ sharing automatically, users do not need to do any configuration.
+
+ * the `NeuralNet` for training and testing (and validation) share most layers
+ , thus share `Param` values.
+
+If the shared `Param` instances resident in the same process (may in different
+threads), they use the same chunk of memory space for their values. But they
+would have different memory spaces for their gradients. In fact, their
+gradients will be averaged by the stub or server.
+
+## Advanced user guide
+
+### Creation
+
+ static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num);
+
+The above function creates a `NeuralNet` for a given phase, and returns a
+pointer to the `NeuralNet` instance. The phase is in {kTrain,
+kValidation, kTest}. `num` is used for net partitioning which indicates the
+number of partitions. Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share most
+layers except the data layer, loss layer or output layer, etc.. The `Create`
+function takes in the full net configuration including layers for training,
+validation and test. It removes layers for phases other than the specified
+phase based on the `exclude` field in
+[layer configuration](layer.html):
+
+ layer {
+ ...
+ exclude : kTest # filter this layer for creating test net
+ }
+
+The filtered net configuration is passed to the constructor of `NeuralNet`:
+
+ NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+
+The constructor creates a graph representing the net structure firstly in
+
+ Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+
+Next, it creates a layer for each node and connects layers if their nodes are
+connected.
+
+ void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+
+Since the `NeuralNet` instance may be shared among multiple workers, the
+`Create` function returns a pointer to the `NeuralNet` instance .
+
+### Parameter sharing
+
+ `Param` sharing
+is enabled by first sharing the Param configuration (in `NeuralNet::Create`)
+to create two similar (e.g., the same shape) Param objects, and then calling
+(in `NeuralNet::CreateNetFromGraph`),
+
+ void Param::ShareFrom(const Param& from);
+
+It is also possible to share `Param`s of two nets, e.g., sharing parameters of
+the training net and the test net,
+
+ void NeuralNet:ShareParamsFrom(NeuralNet* other);
+
+It will call `Param::ShareFrom` for each Param object.
+
+### Access functions
+`NeuralNet` provides a couple of access function to get the layers and params
+of the net:
+
+ const std::vector<Layer*>& layers() const;
+ const std::vector<Param*>& params() const ;
+ Layer* name2layer(string name) const;
+ Param* paramid2param(int id) const;
+
+
+### Partitioning
+
+
+#### Implementation
+
+SINGA partitions the neural net in `CreateGraph` function, which creates one
+node for each (partitioned) layer. For example, if one layer's partition
+dimension is 0 or 1, then it creates `npartition` nodes for it; if the
+partition dimension is -1, a single node is created, i.e., no partitioning.
+Each node is assigned a partition (or location) ID. If the original layer is
+configured with a location ID, then the ID is assigned to each newly created node.
+These nodes are connected according to the connections of the original layers.
+Some connection layers will be added automatically.
+For instance, if two connected sub-layers are located at two
+different workers, then a pair of bridge layers is inserted to transfer the
+feature (and gradient) blob between them. When two layers are partitioned on
+different dimensions, a concatenation layer which concatenates feature rows (or
+columns) and a slice layer which slices feature rows (or columns) would be
+inserted. These connection layers help making the network communication and
+synchronization transparent to the users.
+
+#### Dispatching partitions to workers
+
+Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one
+worker. Particularly, the pointer to the `NeuralNet` instance is passed
+to every worker within the same group, but each worker only computes over the
+layers that have the same partition (or location) ID as the worker's ID. When
+every worker computes the gradients of the entire model parameters
+(strategy-2), we refer to this process as data parallelism. When different
+workers compute the gradients of different parameters (strategy-3 or
+strategy-1), we call this process model parallelism. The hybrid partitioning
+leads to hybrid parallelism where some workers compute the gradients of the
+same subset of model parameters while other workers compute on different model
+parameters. For example, to implement the hybrid parallelism in for the
+[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for
+lower layers and `partition_dim = 1` for higher layers.
+
Added: incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,54 @@
+# Neural Net Partition
+
+---
+
+The purposes of partitioning neural network is to distribute the partitions onto
+different working units (e.g., threads or nodes, called workers in this article)
+and parallelize the processing.
+Another reason for partition is to handle large neural network which cannot be
+hold in a single node. For instance, to train models against images with high
+resolution we need large neural networks (in terms of training parameters).
+
+Since *Layer* is the first class citizen in SIGNA, we do the partition against
+layers. Specifically, we support partitions at two levels. First, users can configure
+the location (i.e., worker ID) of each layer. In this way, users assign one worker
+for each layer. Secondly, for one layer, we can partition its neurons or partition
+the instances (e.g, images). They are called layer partition and data partition
+respectively. We illustrate the two types of partitions using an simple convolutional neural network.
+
+<img src="../images/conv-mnist.png" style="width: 220px"/>
+
+The above figure shows a convolutional neural network without any partition. It
+has 8 layers in total (one rectangular represents one layer). The first layer is
+DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer
+is a MnistLayer which parses the records from MNIST data to get the pixels of a batch
+of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label
+of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the
+shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1)
+sub-samples the images. The fc1 layer is fully connected with pool1 layer. It
+mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which
+is then normalized by a SoftmaxLossLayer to get the prediction.
+
+<img src="../images/conv-mnist-datap.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all layers
+except the DataLayer and ParserLayers, into 3 partitions using data partition.
+The read layers process 4 images of the batch, the black and blue layers process 2 images
+respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer,
+BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm.
+Layers of the same color resident in the same worker. There would be data transferring
+across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer),
+e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1.
+
+<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all layers
+except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can
+see that each layer processes all 8 images from the batch. But different partitions process
+different part of one image. For instance, the layer conv1-00 process only 4 channels. The other
+4 channels are processed by conv1-01 which residents in another worker.
+
+
+Since the partition is done at the layer level, we can apply different partitions for
+different layers to get a hybrid partition for the whole neural network. Moreover,
+we can also specify the layer locations to locate different layers to different workers.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/overview.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/overview.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/overview.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/overview.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,67 @@
+# ê°ì
+
+---
+
+SINGAë ëê·ëª¨ ë°ì´í° ë¶ìì ìí ë¥ë¬ë 모ë¸ì í¸ë ì´ëì 목ì ì¼ë¡ í "ë¶ì° ë¥ ë¬ë íë«í¼" ì
ëë¤.
+모ë¸ì´ ëë ë´ë´ë¤í¸ìí¬ì "ë ì´ì´"ê°ë
ì ë°ë¼ ì§ê´ì ì¼ë¡ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤.
+
+* Convolutional Neural Network ì ê°ì í¼ëí¬ìë ë¤í¸ìí¬ì Restricted Boltzmann Machine ê³¼ ê°ì ìëì§ ëª¨ë¸, Recurrent Neural Network ëª¨ë¸ ë± ë¤ìí 모ë¸ì ì§ìí©ëë¤.
+
+* ë¤ìí "ë ì´ì´"ê° Built-in Layerë¡ ì¤ë¹ëì´ ììµëë¤.
+
+* SINGA ìí¤í
ì²ë synchronous (ë기), asynchronous (ë¹ë기), ê·¸ë¦¬ê³ hybrid (íì´ë¸ë¦¬ë) í¸ë ì´ëì í ì ìëë¡ ì¤ê³ëì´ ììµëë¤.
+
+* ëí 모ë¸ì í¸ë ì´ëì ë³ë ¬ííë ë¤ìí partition ì¤í´ (ë°°ì¹ ë° í¹ì§ ë¶í )ì ì§ìí©ëë¤.
+
+
+## 목ì
+
+íì¥ì± : ë¶ì° ìì¤í
ì¼ë¡ì¨ ë ë§ì ììì ì´ì©íì¬ í¹ì ì ë°ëì ëë¬ í ëê¹ì§ í¸ë ì´ë ìë를 í¥ììí¨ë¤.
+
+ì ì©ì± : ëê·ëª¨ ë¶ì° 모ë¸ì í¨ì¨ì ì¸ í¸ë ì´ëì íìí ë°ì´í°ì 모ë¸ì ë¶í , ë¤í¸ìí¬ íµì ë± íë¡ê·¸ë머ì ìì
ì ë¨ìííê³ , ë³µì¡í ëª¨ë¸ ë° ìê³ ë¦¬ì¦ì 구ì¶ì ì½ê² íë¤.
+
+
+## ì¤ê³ ì´ë
+
+íì¥ì±ì ë¶ì° ë¥ë¬ëìì ì¤ìí ì°êµ¬ ê³¼ì ì
ëë¤.
+SINGAë ë¤ìí í¸ë ì´ë íë ììí¬ì íì¥ì±ì ì ì§í ì ìëë¡ ì¤ê³ëì´ ììµëë¤.
+* Synchronous (ë기) : í¸ë ì´ëì 1ë¨ê³ìì ì»ì ììë í¨ê³¼ë¥¼ ëì
ëë¤.
+* Asynchronous (ë¹ë기) : í¸ë ì´ëì ìë ´ ìë를 í¥ììíµëë¤.
+* Hybrid (íì´ë¸ë¦¬ë) : ì½ì¤í¸ ë° ë¦¬ìì¤ (í´ë¬ì¤í° í¬ê¸° ë±)ì ë§ë í¨ê³¼ì ìë ´ ìëì ê· íì ì¡ê³ íì¥ì±ì í¥ììíµëë¤.
+
+SINGAë ë¥ë¬ë 모ë¸ì ë¤í¸ìí¬ "ë ì´ì´" ê°ë
ì ë°ë¼ ì§ê´ì ì¼ë¡ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤. ë¤ìí 모ë¸ì ì½ê² 구ì¶íê³ í¸ë ì´ë í ì ììµëë¤.
+
+## ìì¤í
ê°ì
+
+<img src = "../ images / sgd.png"align = "center"width = "400px"/>
+<span> <strong> Figure 1 - SGD íë¦ </strong> </span>
+
+"ë¥ë¬ë 모ë¸ì íìµíë¤"ë ê²ì í¹ì ìì
(ë¶ë¥, ì측 ë±)ì ë¬ì±í기 ìí´ ì¬ì©ëë í¹ì§ë(feature)ì ìì±íë ë³í í¨ìì ìµì ì íë¼ë¯¸í°ë¥¼ ì°¾ë ê²ì
ëë¤.
+ë³ìì ì¢ê³ ëì¨ì, Cross-Entropy Loss (https://en.wikipedia.org/wiki/Cross_entropy) ë±ì loss function (ìì¤ í¨ì)ìì íì¸í©ëë¤. ì´ í¨ìë ì¼ë°ì ì¼ë¡ ë¹ì í ëë ë¹ ë³¼ë¡ í¨ìì´ë¯ë¡ é解ì ì°¾ê¸°ê° ì´ë µìµëë¤.
+
+ê·¸ëì Stochastic Gradient Descent (íë¥ ì 구배ê°íë²)ì ì´ì©í©ëë¤.
+Figure 1ê³¼ ê°ì´ 무ììë¡ ì´ê¸°í ë ë§¤ê° ë³ìì ê°ì ìì¤ í¨ìê° ìì ì§ëë¡ ë°ë³µ ì
ë°ì´í¸íê³ ììµëë¤.
+
+<img src = "../ images / overview.png"align = "center"width = "400px"/>
+<span> <strong> Figure 2 - SINGA ê°ì </strong> </span>
+
+í¸ë ì´ëì íìí ìí¬ë¡ëë workersì serversì ë¶ì°ë©ëë¤. Figure 2ì ê°ì´ 루íë§ë¤ workersë *TrainOneBatch* í¨ì를 í¸ì¶ ë§¤ê° ë³ì 기ì¸ê¸°ë¥¼ ê³ì°í©ëë¤.
+*TrainOneBatch* ì ê²½ë§ì êµ¬ì¡°ê° ê¸°ì ë *NeuralNet* ê°ì²´ì ë°ë¼ "ë ì´ì´"를 ì°¨ë¡ë¡ ëë¬ë³´ê³ ììµëë¤.
+ê³ì° ë ê²½ì¬ë ë¡ì»¬ ë
¸ëì stubì ë³´ë´ì ¸ ì§ê³ ë í í´ë¹ serversì ë³´ë´ì§ëë¤. Serversë ì
ë°ì´í¸ ë ë§¤ê° ë³ì를 workersë¡ ì ì¡ ë¤ì 루í를 ì¤íí©ëë¤.
+
+
+## Job
+
+SINGAìì "Job"ì ë´ë´ë¤í¸ìí¬ ëª¨ë¸ê³¼ ë°ì´í° í¸ë ì´ë ë°©ë², í´ë¬ì¤í° í í´ë¡ì§ ë±ì´ 기ì ë "Job Configuration"ì ë§í©ëë¤.
+Job configurationì Figure 2ì ê·¸ë ¤ì§ ë¤ìì 4 ê°ì§ ìì를 ê°ì§ëë¤.
+
+  * [NeuralNet (neural-net.html) : ë´ë´ë¤í¸ìí¬ì 구조ì ê° "ë ì´ì´"ì ì¤ì ì 기ì í©ëë¤.
+  * [TrainOneBatch (train-one-batch.html) : ëª¨ë¸ ì¹´í
ê³ ë¦¬ì ì í©í ìê³ ë¦¬ì¦ì 기ì í©ëë¤.
+  * [Updater] (updater.html) : serverìì ë§¤ê° ë³ì를 ì
ë°ì´í¸íë ë°©ë²ì 기ì í©ëë¤.
+  * [Cluster Topology (distributed-training.html) : workersì servers ë¶ì° í í´ë¡ì§ë¥¼ 기ì í©ëë¤.
+
+[main í¨ì (programming-guide.html)ì SINGA ëë¼ì´ë²ë¡ ìì
ì ì ë¬í©ëë¤.
+
+ì´ íë¡ì¸ì¤ë Hadoopììì Job ìë¸ë¯¸ì
ê³¼ ë¹ì·í©ëë¤.
+ì ì ê° main í¨ììì ìì
ì¤ì ì í©ëë¤.
+Hadoop ì ì ë ìì ì mapperì reducer를 ì¤ì íì§ë§ SINGA ììë ì ì ì "ë ì´ì´"ë Updater ë±ì ì¤ì í©ëë¤.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/param.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/param.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/param.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/param.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,226 @@
+# Parameters
+
+---
+
+A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
+or a bias vector. *Basic user guide* describes how to configure for a `Param`
+object, and *Advanced user guide* provides details on implementing users'
+parameter initialization methods.
+
+## Basic user guide
+
+The configuration of a Param object is inside a layer configuration, as the
+`Param` are associated with layers. An example configuration is like
+
+ layer {
+ ...
+ param {
+ name : "p1"
+ init {
+ type : kConstant
+ value: 1
+ }
+ }
+ }
+
+The [SGD algorithm](overview.html) starts with initializing all
+parameters according to user specified initialization method (the `init` field).
+For the above example,
+all parameters in `Param` "p1" will be initialized to constant value 1. The
+configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html):
+
+ * name, an identifier string. It is an optional field. If not provided, SINGA
+ will generate one based on layer name and its order in the layer.
+ * init, field for setting initialization methods.
+ * share_from, name of another `Param` object, from which this `Param` will share
+ configurations and values.
+ * lr_scale, float value to be multiplied with the learning rate when
+ [updating the parameters](updater.html)
+ * wd_scale, float value to be multiplied with the weight decay when
+ [updating the parameters](updater.html)
+
+There are some other fields that are specific to initialization methods.
+
+### Initialization methods
+
+Users can set the `type` of `init` use the following built-in initialization
+methods,
+
+ * `kConst`, set all parameters of the Param object to a constant value
+
+ type: kConst
+ value: float # default is 1
+
+ * `kGaussian`, initialize the parameters following a Gaussian distribution.
+
+ type: kGaussian
+ mean: float # mean of the Gaussian distribution, default is 0
+ std: float # standard variance, default is 1
+ value: float # default 0
+
+ * `kUniform`, initialize the parameters following an uniform distribution
+
+ type: kUniform
+ low: float # lower boundary, default is -1
+ high: float # upper boundary, default is 1
+ value: float # default 0
+
+ * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
+ matrix) using `kGaussian` and then
+ multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
+ columns of the matrix.
+
+ * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
+ distribution is an uniform distribution.
+
+ * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then
+ multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
+ fan_out` sums up the number of columns and rows of the matrix.
+
+For all above initialization methods except `kConst`, if their `value` is not
+1, every parameter will be multiplied with `value`. Users can also implement
+their own initialization method following the *Advanced user guide*.
+
+
+## Advanced user guide
+
+This sections describes the details on implementing new parameter
+initialization methods.
+
+### Base ParamGenerator
+All initialization methods are implemented as
+subclasses of the base `ParamGenerator` class.
+
+ class ParamGenerator {
+ public:
+ virtual void Init(const ParamGenProto&);
+ void Fill(Param*);
+
+ protected:
+ ParamGenProto proto_;
+ };
+
+Configurations of the initialization method is in `ParamGenProto`. The `Fill`
+function fills the `Param` object (passed in as an argument).
+
+### New ParamGenerator subclass
+
+Similar to implement a new Layer subclass, users can define a configuration
+protocol message,
+
+ # in user.proto
+ message FooParamProto {
+ optional int32 x = 1;
+ }
+ extend ParamGenProto {
+ optional FooParamProto fooparam_conf =101;
+ }
+
+The configuration of `Param` would be
+
+ param {
+ ...
+ init {
+ user_type: 'FooParam" # must use user_type for user defined methods
+ [fooparam_conf] { # must use brackets for configuring user defined messages
+ x: 10
+ }
+ }
+ }
+
+The subclass could be declared as,
+
+ class FooParamGen : public ParamGenerator {
+ public:
+ void Fill(Param*) override;
+ };
+
+Users can access the configuration fields in `Fill` by
+
+ int x = proto_.GetExtension(fooparam_conf).x();
+
+To use the new initialization method, users need to register it in the
+[main function](programming-guide.html).
+
+ driver.RegisterParamGenerator<FooParamGen>("FooParam") # must be consistent with the user_type in configuration
+
+{% comment %}
+### Base Param class
+
+### Members
+
+ int local_version_;
+ int slice_start_;
+ vector<int> slice_offset_, slice_size_;
+
+ shared_ptr<Blob<float>> data_;
+ Blob<float> grad_;
+ ParamProto proto_;
+
+Each Param object has a local version and a global version (inside the data
+Blob). These two versions are used for synchronization. If multiple Param
+objects share the same values, they would have the same `data_` field.
+Consequently, their global version is the same. The global version is updated
+by [the stub thread](communication.html). The local version is
+updated in `Worker::Update` function which assigns the global version to the
+local version. The `Worker::Collect` function is blocked until the global
+version is larger than the local version, i.e., when `data_` is updated. In
+this way, we synchronize workers sharing parameters.
+
+In Deep learning models, some Param objects are 100 times larger than others.
+To ensure the load-balance among servers, SINGA slices large Param objects. The
+slicing information is recorded by `slice_*`. Each slice is assigned a unique
+ID starting from 0. `slice_start_` is the ID of the first slice of this Param
+object. `slice_offset_[i]` is the offset of the i-th slice in this Param
+object. `slice_size_[i]` is the size of the i-th slice. These slice information
+is used to create messages for transferring parameter values or gradients to
+different servers.
+
+Each Param object has a `grad_` field for gradients. Param objects do not share
+this Blob although they may share `data_`. Because each layer containing a
+Param object would contribute gradients. E.g., in RNN, the recurrent layers
+share parameters values, and the gradients used for updating are averaged from all recurrent
+these recurrent layers. In SINGA, the stub thread will aggregate local
+gradients for the same Param object. The server will do a global aggregation
+of gradients for the same Param object.
+
+The `proto_` field has some meta information, e.g., name and ID. It also has a
+field called `owner` which is the ID of the Param object that shares parameter
+values with others.
+
+### Functions
+The base Param class implements two sets of functions,
+
+ virtual void InitValues(int version = 0); // initialize values according to `init_method`
+ void ShareFrom(const Param& other); // share `data_` from `other` Param
+ --------------
+ virtual Msg* GenGetMsg(bool copy, int slice_idx);
+ virtual Msg* GenPutMsg(bool copy, int slice_idx);
+ ... // other message related functions.
+
+Besides the functions for processing the parameter values, there is a set of
+functions for generating and parsing messages. These messages are for
+transferring parameter values or gradients between workers and servers. Each
+message corresponds to one Param slice. If `copy` is false, it means the
+receiver of this message is in the same process as the sender. In such case,
+only pointers to the memory of parameter value (or gradient) are wrapped in
+the message; otherwise, the parameter values (or gradients) should be copied
+into the message.
+
+
+## Implementing Param subclass
+Users can extend the base Param class to implement their own parameter
+initialization methods and message transferring protocols. Similar to
+implementing a new Layer subclasses, users can create google protocol buffer
+messages for configuring the Param subclass. The subclass, denoted as FooParam
+should be registered in main.cc,
+
+ dirver.RegisterParam<FooParam>(kFooParam); // kFooParam should be different to 0, which is for the base Param type
+
+
+ * type, an integer representing the `Param` type. Currently SINGA provides one
+ `Param` implementation with type 0 (the default type). If users want
+ to use their own Param implementation, they should extend the base Param
+ class and configure this field with `kUserParam`
+
+{% endcomment %}
Added: incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,98 @@
+# Programmer Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+ * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+ * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+ * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+ * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided by SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+ ./bin/singa-run.sh -conf <path to job conf> [-resume] [-test]
+
+* `-resume` is for continuing the training from last [checkpoint](checkpoint.html).
+* `-test` is for testing the performance of previously trained model and extracting features for new data,
+more details are available [here](test.html).
+
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+ * initialize SINGA, e.g., setup logging.
+
+ * register user-defined components.
+
+ * create and pass the job configuration to SINGA driver
+
+An example main function is like
+
+ #include <string>
+ #include "singa.h"
+ #include "user.h" // header for user code
+
+ int main(int argc, char** argv) {
+ singa::Driver driver;
+ driver.Init(argc, argv);
+ bool resume;
+ // parse resume option from argv.
+
+ // register user defined layers
+ driver.RegisterLayer<FooLayer, std::string>("kFooLayer");
+ // register user defined updater
+ driver.RegisterUpdater<FooUpdater, std::string>("kFooUpdater");
+ ...
+ auto jobConf = driver.job_conf();
+ // update jobConf
+
+ driver.Submit(resume, jobConf);
+ return 0;
+ }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*. To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+ ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.
+
Added: incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,95 @@
+# Programming Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+ * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections;
+ * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories;
+ * an [Updater](updater.html) defining the protocol for updating parameters at the server side;
+ * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line,
+
+ ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` is for continuing the training from last
+[checkpoint](checkpoint.html).
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+ * initialize SINGA, e.g., setup logging.
+
+ * register user-defined components.
+
+ * create and pass the job configuration to SINGA driver
+
+
+An example main function is like
+
+ #include "singa.h"
+ #include "user.h" // header for user code
+
+ int main(int argc, char** argv) {
+ singa::Driver driver;
+ driver.Init(argc, argv);
+ bool resume;
+ // parse resume option from argv.
+
+ // register user defined layers
+ driver.RegisterLayer<FooLayer>(kFooLayer);
+ // register user defined updater
+ driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+ ...
+ auto jobConf = driver.job_conf();
+ // update jobConf
+
+ driver.Train(resume, jobConf);
+ return 0;
+ }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*. To launch the program, users just pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+ ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,177 @@
+# íµ ì¤íí¸
+
+---
+
+## SINGA ì¸ì¤í¨
+
+SINGA ì¸ì¤í¨ì [ì¬ê¸°](installation.html)를 참조íììì¤.
+
+### Zookeeper ì¤í
+
+SINGA í¸ë ì´ëì [zookeeper](https://zookeeper.apache.org/)를 ì´ì©í©ëë¤. ì°ì zookeeper ìë¹ì¤ê° ììëì´ ìëì§ íì¸íììì¤.
+
+ì¤ë¹ë thirdparty ì¤í¬ë¦½í¸ë¥¼ ì¬ì©íì¬ zookeeper를 ì¤ì¹ í ê²½ì° ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤ííììì¤.
+
+Â Â Â Â #goto top level folder
+Â Â Â Â cd SINGA_ROOT
+Â Â Â Â ./bin/zk-service.sh start
+
+(`./bin/zk-service.sh stop` // zookeeper ì¤ì§).
+
+기본 í¬í¸ë¥¼ ì¬ì©íì§ ìê³ zookeeper를 ìììí¬ ë`conf / singa.conf`ì í¸ì§íììì¤.
+
+Â Â Â Â zookeeper_host : "localhost : YOUR_PORT"
+
+## ë
립í 모ëìì ì¤í
+
+ë
립í 모ëìì SINGAì ì¤íí ë, [Mesos](http://mesos.apache.org/)ì [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop- yarn-site / YARN.html)ê³¼ ê°ì í´ë¬ì¤í° ê´ë¦¬ì ì´ì©íì§ ìë ê²½ì°ë¥¼ ë§í©ëë¤.
+
+### Single ë
¸ëììì íë ¨
+
+íëì íë¡ì¸ì¤ê° ì¶ìë©ëë¤.
+ì를 ë¤ì´,
+[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) ë°ì´í° ì¸í¸ë¥¼ ì´ì©íì¬
+[CNN 모ë¸](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)ì í¸ë ì´ëìíµëë¤.
+íì´í¼ íë¼ë¯¸í°ë [cuda-convnet](https://code.google.com/p/cuda-convnet/)ì ë°ë¼ ì¤ì ëì´ ììµëë¤.
+ìì¸í ë´ì©ì [CNN ìí](cnn.html) íì´ì§ë¥¼ 참조íììì¤.
+
+
+#### ë°ì´í°ì ìì
ì¤ì
+
+ë°ì´í° ì¸í¸ ë¤ì´ë¡ëì Triaingì´ë Test를ìí ë°ì´í° ì¤ëì ìì±ì ë¤ìê³¼ ê°ì´ ì¤ìí©ëë¤.
+
+Â Â Â Â cd examples / cifar10 /
+Â Â Â Â cp Makefile.example Makefile
+Â Â Â Â make download
+Â Â Â Â make create
+
+Trainingê³¼ Test ë°ì´í° ì¸í¸ë ê°ê° * cifar10-train-shard *
+ê·¸ë¦¬ê³ * cifar10-test-shard * í´ëì ë§ë¤ì´ì§ëë¤. 모ë ì´ë¯¸ì§ì í¹ì§ íê· ì ë¬ì¬ í * image_mean.bin * íì¼ë ìì±ë©ëë¤.
+
+CNN ëª¨ë¸ íìµì íìí ìì¤ ì½ëë 모ë SINGAì í¬í¨ëì´ ììµëë¤. ì½ë를 ì¶ê° í íìê° ììµëë¤.
+ìì
ì¤ì íì¼ (*job.conf*) ì ì§ì íì¬ ì¤í¬ë¦½í¸ (*.. / .. / bin / singa-run.sh*)를 ì¤íí©ëë¤.
+SINGA ì½ë를 ë³ê²½íê±°ë ì¶ê° í ë, íë¡ê·¸ëë° ê°ì´ë (programming-guide.html)를 참조íììì¤.
+
+#### ë³ë ¬í ìì´ í¸ë ì´ë
+
+Cluster Topologyì 기본ê°ì íëì workerì íëì serverê° ììµëë¤.
+ë°ì´í°ì ì ê²½ë§ì ë³ë ¬ ì²ë¦¬ëëì§ ììµëë¤.
+
+íë ¨ì ììíë ¤ë©´ ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤íí©ëë¤.
+
+Â Â Â Â # goto top level folder
+Â Â Â Â cd ../../
+Â Â Â Â ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+
+íì¬ ì¤íì¤ì¸ ìì
ì 목ë¡ì ë³´ë ¤ë©´
+
+Â Â Â Â ./bin/singa-console.sh list
+
+Â Â Â Â JOB ID | NUM PROCS
+Â Â Â Â ---------- | -----------
+Â Â Â Â 24 | 1
+
+ìì
ì¢
ë£íë ¤ë©´
+
+Â Â Â Â ./bin/singa-console.sh kill JOB_ID
+
+
+ë¡ê·¸ ë° ìì
ì ë³´ * / tmp / singa-log * í´ëì ì ì¥ë©ëë¤.
+* conf / singa.conf * íì¼`log-dir`ìì ë³ê²½ ê°ë¥í©ëë¤.
+
+
+#### ë¹ë기 ë³ë ¬ í¸ë ì´ë
+
+Â Â Â Â # job.conf
+Â Â Â Â ...
+Â Â Â Â cluster {
+Â Â Â Â Â Â nworker_groups : 2
+Â Â Â Â Â Â nworkers_per_procs : 2
+Â Â Â Â Â Â workspace : "examples/cifar10/"
+Â Â Â Â }
+
+ì¬ë¬ worker 그룹ì ì¶ìí¨ì¼ë¡ì¨
+In SINGA, ë¹ë기 í¸ë ì´ë (architecture.html)ì ìí í ì ììµëë¤.
+ì를 ë¤ì´, *job.conf* ì ìì ê°ì´ ë³ê²½í©ëë¤.
+기본ì ì¼ë¡ íëì worker ê·¸ë£¹ì´ íëì worker를 ê°ëë¡ ì¤ì ëì´ ììµëë¤.
+ìì ì¤ì ì íëì íë¡ì¸ì¤ì 2 ê°ì workerê° ì¤ì ëì´ ì기 ë문ì 2 ê°ì worker ê·¸ë£¹ì´ ëì¼í íë¡ì¸ì¤ë¡ ì¤íë©ëë¤.
+ê²°ê³¼ ë©ëª¨ë¦¬ [Downpour (frameworks.html) í¸ë ì´ë íë ì ìí¬ë¡ ì¤íë©ëë¤.
+
+ì¬ì©ìë ë°ì´í°ì ë¶ì°ì ì ê²½ ì¸ íìë ììµëë¤.
+ëë¤ ì¤íì
ì ë°ë¼ ê° worker 그룹ì ë°ì´í°ê° ë³´ë´ì§ëë¤.
+ê° workerë ë¤ë¥¸ ë°ì´í° íí°ì
ì ë´ë¹í©ëë¤.
+
+Â Â Â Â # job.conf
+Â Â Â Â ...
+Â Â Â Â neuralnet {
+Â Â Â Â Â Â layer {
+Â Â Â Â Â Â Â Â ...
+Â Â Â Â Â Â Â Â sharddata_conf {
+Â Â Â Â Â Â Â Â Â Â random_skip : 5000
+Â Â Â Â Â Â Â Â }
+Â Â Â Â Â Â }
+Â Â Â Â Â Â ...
+Â Â Â Â }
+
+ì¤í¬ë¦½í¸ ì¤í :
+
+Â Â Â Â ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+#### ë기í ë³ë ¬ í¸ë ì´ë
+
+Â Â Â Â # job.conf
+Â Â Â Â ...
+Â Â Â Â cluster {
+Â Â Â Â Â Â nworkers_per_group : 2
+Â Â Â Â Â Â nworkers_per_procs : 2
+Â Â Â Â Â Â workspace : "examples/cifar10/"
+Â Â Â Â }
+
+íëì worker 그룹ì¼ë¡ ì¬ë¬ worker를 ì¤ííì¬ ë기 í¸ë ì´ë (architecture.html)ì ìí í ì ììµëë¤.
+ì를 ë¤ì´, *job.conf* íì¼ì ìì ê°ì´ ë³ê²½í©ëë¤.
+ìì ì¤ì ì íëì worker 그룹ì 2 ê°ì workerê° ì¤ì ëììµëë¤.
+worker ì°ë¦¬ë 그룹 ë´ìì ë기íí©ëë¤.
+ì´ê²ì ë©ëª¨ë¦¬ [sandblaster (frameworks.html)ë¡ ì¤íë©ëë¤.
+모ë¸ì 2 ê°ì workerë¡ ë¶í ë©ëë¤. ê° ë ì´ì´ê° 2 ê°ì workerë¡ ë³´ë
ëë¤.
+ë°°ë¶ ë ë ì´ì´ë ì본 ë ì´ì´ì 기ë¥ì ê°ì§ë§ í¹ì§ ì¸ì¤í´ì¤ì ìê°`B / g`ë©ëë¤.
+ì¬ê¸°ì`B`ë 미ëë°§ì° ì¸ì¤í´ì¤ì ì«ìë¡`g`ë 그룹ì worker ì ììµëë¤.
+[ë¤ë¥¸ ì²´ê³ (neural-net.html)를 ì´ì©í ë ì´ì´ (ì ê²½ë§) íí°ì
ë°©ë²ë ììµëë¤.
+
+ë¤ë¥¸ ì¤ì ì 모ë "ë³ë ¬í ìì"ì ê²½ì°ì ëì¼í©ëë¤.
+
+Â Â Â Â ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+### í´ë¬ì¤í°ììì íë ¨
+
+í´ë¬ì¤í° ì¤ì ì ë³ê²½íì¬ ì í¸ë ì´ë íë ì ìí¬ì íì¥í©ëë¤.
+
+Â Â Â Â nworker_per_procs : 1
+
+모ë íë¡ì¸ì¤ë íëì worker ì¤ë ë를 ìì±í©ëë¤.
+ê²°ê³¼ worker ì°ë¦¬ë ë¤ë¥¸ íë¡ì¸ì¤ (ë
¸ë)ìì ìì±ë©ëë¤.
+í´ë¬ì¤í°ì ë
¸ë를 í¹ì íë ¤ë©´ *SINGA_ROOT/conf/* ì *hostfile* ì ì¤ââì ì´ íìí©ëë¤.
+
+e.g.,
+
+Â Â Â Â logbase-a01
+Â Â Â Â logbase-a02
+
+zookeeper locationë ì¤ì í´ì¼í©ëë¤.
+
+e.g.,
+
+Â Â Â Â # conf/singa.conf
+Â Â Â Â zookeeper_host : "logbase-a01"
+
+ì¤í¬ë¦½í¸ì ì¤íì "Single ë
¸ë í¸ë ì´ë"ê³¼ ëì¼í©ëë¤.
+
+Â Â Â Â ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+## Mesosìì ì¤í
+
+* working * ...
+
+## ë¤ì
+
+SINGA ì ì½ë ë³ê²½ ë° ì¶ê°ì ëí ìì¸í ë´ì©ì [íë¡ê·¸ëë° ê°ì´ë](programming-guide.html)를 참조íììì¤.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md?rev=1724348&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md Wed Jan 13 03:46:19 2016
@@ -0,0 +1,365 @@
+# RBM Example
+
+---
+
+This example uses SINGA to train 4 RBM models and one auto-encoder model over the
+[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained
+to reduce the dimensionality of the MNIST image feature. The RBM models are trained
+to initialize parameters of the auto-encoder model. This example application is
+from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).
+
+## Running instructions
+
+Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.
+
+The MNIST dataset has 70,000 handwritten digit images. The
+[data preparation](data.html) page
+has details on converting this dataset into SINGA recognizable format. Users can
+simply run the following commands to download and convert the dataset.
+
+ # at SINGA_ROOT/examples/mnist/
+ $ cp Makefile.example Makefile
+ $ make download
+ $ make create
+
+The training is separated into two phases, namely pre-training and fine-tuning.
+The pre-training phase trains 4 RBMs in sequence,
+
+ # at SINGA_ROOT/
+ $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
+ $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
+ $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
+ $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf
+
+The fine-tuning phase trains the auto-encoder by,
+
+ $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf
+
+
+## Training details
+
+### RBM1
+
+<img src="../images/example-rbm1.png" align="center" width="200px"/>
+<span><strong>Figure 1 - RBM1.</strong></span>
+
+The neural net structure for training RBM1 is shown in Figure 1.
+The data layer and parser layer provides features for training RBM1.
+The visible layer (connected with parser layer) of RBM1 accepts the image feature
+(784 dimension). The hidden layer is set to have 1000 neurons (units).
+These two layers are configured as,
+
+ layer{
+ name: "RBMVis"
+ type: kRBMVis
+ srclayers:"mnist"
+ srclayers:"RBMHid"
+ rbm_conf{
+ hdim: 1000
+ }
+ param{
+ name: "w1"
+ init{
+ type: kGaussian
+ mean: 0.0
+ std: 0.1
+ }
+ }
+ param{
+ name: "b11"
+ init{
+ type: kConstant
+ value: 0.0
+ }
+ }
+ }
+
+ layer{
+ name: "RBMHid"
+ type: kRBMHid
+ srclayers:"RBMVis"
+ rbm_conf{
+ hdim: 1000
+ }
+ param{
+ name: "w1_"
+ share_from: "w1"
+ }
+ param{
+ name: "b12"
+ init{
+ type: kConstant
+ value: 0.0
+ }
+ }
+ }
+
+
+
+For RBM, the weight matrix is shared by the visible and hidden layers. For instance,
+`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure
+the `share_from` field to enable [parameter sharing](param.html)
+as shown above for the param `w1` and `w1_`.
+
+[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
+is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
+Following Hinton's paper, we configure the [updating protocol](updater.html)
+as follows,
+
+ # Updater Configuration
+ updater{
+ type: kSGD
+ momentum: 0.2
+ weight_decay: 0.0002
+ learning_rate{
+ base_lr: 0.1
+ type: kFixed
+ }
+ }
+
+Since the parameters of RBM0 will be used to initialize the auto-encoder, we should
+configure the `workspace` field to specify a path for the checkpoint folder.
+For example, if we configure it as,
+
+ cluster {
+ workspace: "examples/rbm/rbm1/"
+ }
+
+Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*.
+
+### RBM1
+<img src="../images/example-rbm2.png" align="center" width="200px"/>
+<span><strong>Figure 2 - RBM2.</strong></span>
+
+Figure 2 shows the net structure of training RBM2.
+The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer
+is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned
+from RBM1.
+The neural net configuration is (with layers for data layer and parser layer omitted).
+
+ layer{
+ name: "Inner1"
+ type: kInnerProduct
+ srclayers:"mnist"
+ innerproduct_conf{
+ num_output: 1000
+ }
+ param{ name: "w1" }
+ param{ name: "b12"}
+ }
+
+ layer{
+ name: "Sigmoid1"
+ type: kSigmoid
+ srclayers:"Inner1"
+ }
+
+ layer{
+ name: "RBMVis"
+ type: kRBMVis
+ srclayers:"Sigmoid1"
+ srclayers:"RBMHid"
+ rbm_conf{
+ hdim: 500
+ }
+ param{
+ name: "w2"
+ ...
+ }
+ param{
+ name: "b21"
+ ...
+ }
+ }
+
+ layer{
+ name: "RBMHid"
+ type: kRBMHid
+ srclayers:"RBMVis"
+ rbm_conf{
+ hdim: 500
+ }
+ param{
+ name: "w2_"
+ share_from: "w2"
+ }
+ param{
+ name: "b22"
+ ...
+ }
+ }
+
+To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,
+
+ checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
+ cluster{
+ workspace: "examples/rbm/rbm2"
+ }
+
+The workspace is changed for checkpointing `w2`, `b21` and `b22` into
+*examples/rbm/rbm2/*.
+
+### RBM3
+
+<img src="../images/example-rbm3.png" align="center" width="200px"/>
+<span><strong>Figure 3 - RBM3.</strong></span>
+
+Figure 3 shows the net structure of training RBM3. In this model, a layer with
+250 units is added as the hidden layer of RBM3. The visible units of RBM3
+accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
+`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
+i.e., "examples/rbm/rbm2/".
+
+### RBM4
+
+
+<img src="../images/example-rbm4.png" align="center" width="200px"/>
+<span><strong>Figure 4 - RBM4.</strong></span>
+
+Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
+but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
+top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
+Gaussian whose mean is determined by the input from the RBM's logistic visible
+units. So we add a `gaussian` field in the RBMHid layer to control the
+sampling distribution (Gaussian or Bernoulli). In addition, this
+RBM has a much smaller learning rate (0.001). The neural net configuration for
+the RBM4 and the updating protocol is (with layers for data layer and parser
+layer omitted),
+
+ # Updater Configuration
+ updater{
+ type: kSGD
+ momentum: 0.9
+ weight_decay: 0.0002
+ learning_rate{
+ base_lr: 0.001
+ type: kFixed
+ }
+ }
+
+ layer{
+ name: "RBMVis"
+ type: kRBMVis
+ srclayers:"Sigmoid3"
+ srclayers:"RBMHid"
+ rbm_conf{
+ hdim: 30
+ }
+ param{
+ name: "w4"
+ ...
+ }
+ param{
+ name: "b41"
+ ...
+ }
+ }
+
+ layer{
+ name: "RBMHid"
+ type: kRBMHid
+ srclayers:"RBMVis"
+ rbm_conf{
+ hdim: 30
+ gaussian: true
+ }
+ param{
+ name: "w4_"
+ share_from: "w4"
+ }
+ param{
+ name: "b42"
+ ...
+ }
+ }
+
+### Auto-encoder
+In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
+networks that are initialized using the parameters from the previous 4 RBMs.
+
+<img src="../images/example-autoencoder.png" align="center" width="500px"/>
+<span><strong>Figure 5 - Auto-Encoders.</strong></span>
+
+
+Figure 5 shows the neural net structure for training the auto-encoder.
+[Back propagation (kBP)] (train-one-batch.html) is
+configured as the algorithm for `TrainOneBatch`. We use the same cluster
+configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with
+fixed learning rate.
+
+ ### Updater Configuration
+ updater{
+ type: kAdaGrad
+ learning_rate{
+ base_lr: 0.01
+ type: kFixed
+ }
+ }
+
+
+
+According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf),
+we configure a EuclideanLoss layer to compute the reconstruction error. The neural net
+configuration is (with some of the middle layers omitted),
+
+ layer{ name: "data" }
+ layer{ name:"mnist" }
+ layer{
+ name: "Inner1"
+ param{ name: "w1" }
+ param{ name: "b12" }
+ }
+ layer{ name: "Sigmoid1" }
+ ...
+ layer{
+ name: "Inner8"
+ innerproduct_conf{
+ num_output: 784
+ transpose: true
+ }
+ param{
+ name: "w8"
+ share_from: "w1"
+ }
+ param{ name: "b11" }
+ }
+ layer{ name: "Sigmoid8" }
+
+ # Euclidean Loss Layer Configuration
+ layer{
+ name: "loss"
+ type:kEuclideanLoss
+ srclayers:"Sigmoid8"
+ srclayers:"mnist"
+ }
+
+To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as
+
+ ### Checkpoint Configuration
+ checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"
+
+
+## Visualization Results
+
+<div>
+<img src="../images/rbm-weight.PNG" align="center" width="300px"/>
+
+<img src="../images/rbm-feature.PNG" align="center" width="300px"/>
+<br/>
+<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
+
+
+
+
+
+<span><strong>Figure 7 - Top layer features.</strong></span>
+</div>
+
+Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
+Gabor-like filters are learned. Figure 7 depicts the features extracted from
+the top-layer of the auto-encoder, wherein one point represents one image.
+Different colors represent different digits. We can see that most images are
+well clustered according to the ground truth.