You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2016/04/12 08:22:21 UTC

svn commit: r1738695 [2/10] - in /incubator/singa/site/trunk/content: ./ markdown/docs/ markdown/docs/zh/ markdown/releases/ markdown/v0.2.0/ markdown/v0.2.0/jp/ markdown/v0.2.0/kr/ markdown/v0.2.0/zh/

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/general-rnn.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/general-rnn.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/general-rnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/general-rnn.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,203 @@
+# RNN in SINGA
+
+---
+
+Recurrent neural networks (RNN) are widely used for modelling sequential data,
+e.g., natural language sentences. In this page, we describe how to implement a
+RNN application (or model) using SINGA built-in RNN layers. We will
+use the [char-rnn modle](https://github.com/karpathy/char-rnn) as an example,
+which trains over setences or source code, with each character as an input
+unit. Particularly, we will train a RNN using GRU over
+[Linux kernel source code](http://cs.stanford.edu/people/karpathy/char-rnn/).
+After training, we expect to generate meaningful code from the model, like the
+one shown by [Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
+There is a [vanilla RNN example](rnn.html) for language modelling using user
+defined RNN layers, which is different to using built-in RNN layers discribed
+in this page.
+
+```
+/*
+ * If this error is set, we will need anything right after that BSD.
+ */
+static void action_new_function(struct s_stat_info *wb)
+{
+  unsigned long flags;
+  int lel_idx_bit = e->edd, *sys & ~((unsigned long) *FIRST_COMPAT);
+  buf[0] = 0xFFFFFFFF & (bit << 4);
+  min(inc, slist->bytes);
+  printk(KERN_WARNING "Memory allocated %02x/%02x, "
+      "original MLL instead\n"),
+    min(min(multi_run - s->len, max) * num_data_in),
+    frame_pos, sz + first_seg);
+  div_u64_w(val, inb_p);
+  spin_unlock(&disk->queue_lock);
+  mutex_unlock(&s->sock->mutex);
+  mutex_unlock(&func->mutex);
+  return disassemble(info->pending_bh);
+}
+```
+
+## User configuration
+
+The major diffences to the configuration of other models, e.g., feed-forward
+models include,
+
+1. the training algorithm should be changed to BPTT (back-propagation through time).
+2. the layers and their connections should be configured differently.
+
+The train one batch algorithm can be simply configured as
+
+    train_one_batch {
+      alg: kBPTT
+    }
+
+Next, we introduce the configuration of the neural net.
+
+<img src="../images/char-rnn.png" style="width: 550px"/>
+<p><strong> Fig.1 - Illustration of the structure of the Char-RNN model</strong></p>
+
+Fig.1 illustrates the net structure of the char-rnn model. The input layer
+buffers all training data (the Linux kernel code is about 6MB). For each
+iteration, it reads `unroll_len +1` (`unroll_len` is configured by users)
+successive characters, e.g., "int a;", and passes the first `unroll_len`
+characters to `OneHotLayer`s (one per layer). Every `OneHotLayer` converts its
+character into the one-hot vector representation. The input layer passes the
+last `unroll_len` characters as labels to the `RNNLabelLayer` (the label of the
+i-th character is the i+1 character, i.e., the objective is to predict the next
+character).  Each `GRULayer` receives an one-hot vector and the hidden feature
+vector from its precedent layer. After some feature transformation, its own
+feature vector is passed to an inner-product layer and its successive
+`GRULayer`. The i-th SoftmaxLossLayer measures the cross-entropy loss for
+predicting the i-th character. According to Karpathy, there could be another
+stack of `GRULayer`s connecting the first stack of `GRULayer`s, which improves
+the performance if there is enough training data. The layer configuration is
+similar to that for other models, e.g., feed-forward models. The major
+difference is on the connection configuration.
+
+### Unrolling length
+
+To model the long dependency, recurrent layers need to be unrolled many times,
+denoted as `unroll_len` (i.e., 50). According to our unified neural net
+representation, the neural net should have configurations for `unroll_len`
+recurrent layers. It is tedious
+to let users configure these layers manually. Hence, SINGA makes it a
+configuration field for each layer.  For example, to unroll the `GRULayer`,
+users just configure it as,
+
+    layer {
+      type: kGRU
+      unroll_len: 50
+    }
+
+Not only the `GRULayer` is unrolled, other layers like `InnerProductLayer` and
+`SoftmaxLossLayer`, are also unrolled. To simplify the configuration, SINGA
+provides a `unroll_len` field in the net configuration, which sets the
+`unroll_len` of each layer configuration if the `unroll_len` is not configured
+explicitly for that layer. For instance, SINGA would set the `unroll_len` of
+the `GRULayer` to 50 implicitly for the following configuration.
+
+    net {
+      unroll_len: 50
+       layer {
+         type: kCharRNNInput
+         unroll_len: 1  // configure it explicitly
+       }
+       layer {
+         type: kGRU
+         // no configuration for unroll_len
+        }
+     }
+
+### ConnectionType
+<img src="http://karpathy.github.io/assets/rnn/diags.jpeg" style="width: 550px"/>
+<p><strong> Fig.1 - Different RNN structures from [Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)</strong></p>
+
+There would be many types of connections between layers in RNN models as shown
+by Karpathy in Fig.2.  For each `srclayer`, there is a connection_type for it.
+Taking the i-th `srclayer` as an example, if its connection type is,
+
+* kOneToOne, then each unrolled layer is connected with one unrolled layer from the i-th `srclayer`.
+* kOneToALL, then each unrolled layer is connected with all unrolled layers from the i-th `srclayer`.
+
+## Implementation
+
+### Neural net configuration preprocessing
+
+User configured neural net is preprocessed to unroll the recurrent layers,
+i.e., duplicating the configuration of the `GRULayer`s, renaming the name of
+each layer with unrolling index, and re-configuring the `srclayer` field. After
+preprocessing, each layer's name is changed to
+`<unrolling_index>#<user_configured_name>.`  Consequently, the (unrolled) neural
+net configuration passed to NeuralNet class includes all layers and their
+connections.  The NeuralNet class creates and setup each layer in the same way
+as for other models.  For example, after partitioning, each layer's name is
+changed to `<layer_name>@<partition_index>`. One difference is that it has some
+special code for sharing Param data and grad Blobs for layers unrolled from the
+same original layer.
+
+Users can visualize the neural net structure using the Python script `tool/graph.py`
+and the files in *WORKSPACE/visualization/*. For example, after the training program
+is started,
+
+    python tool/graph.py examples/char-rnn/visualization/train_net.json
+
+The generated image file is shown in Fig.3 for `unroll_len=5`,
+
+<img src="../images/char-rnn-net.jpg" style="width: 550px"/>
+<p><strong> Fig.3 - Net structure generated by SINGA</strong></p>
+
+### BPTTWorker
+
+The BPTT (back-propagation through time) algorithm is typically used to compute
+gradients of the objective loss w.r.t. parameters for RNN models. It forwards
+propagates through all unrolled layers (i.e., timepoints) to compute features
+of each layer, and backwards propagates to compute gradients of parameters. It
+is the same as the BP algorithm for feed-forward models if the recurrent layers
+are unrolled infinite times. In practice, due to the constraint of memory, the
+truncated BPTT is widely used.  It unrolls the recurrent layers a fixed
+(truncated) times (controlled by `unroll_len`).  In SINGA, a BPTTWorker is
+provided to run the truncated BPTT algorithm for each mini-batch (i.e.,
+iteration).  The pseudo code is
+
+```
+BPTTWorker::Forward(phase, net) {
+  for each layer in net
+    if layer.unroll_index() == 0
+      Get(layer.params());   // fetch params values from servers
+    srclayers = layer.srclayer();
+    if phase & kTest
+      srclayers.push_back(net->GetConextLayer(layer))
+    layer.ComputeFeature(phase, srclayers)
+}
+
+BPTTWorker::Backward(phase, net) {
+  for each layer in reverse(net.layers())
+    layer.ComputeGradient(layer.srclayers())
+    if layer.unroll_index() == 0
+      Update(layer.params());   // send params gradients to servers
+}
+```
+
+The testing phase is processed specially. Because the test phase may sample a
+long sequence of data (e.g., sampling a piece of Linux kernel code), which
+requires many unrolled layers (e.g., more than 1000 characters/layers). But we
+cannot unroll the recurrent layers too many times due to memory constraint.
+The special line add the 0-th unrolled layer as one of its own source layer.
+Consequently, it dynamically adds a recurrent connection to the recurrent layer
+(e.g., GRULayer). Then we can sample from the model for infinite times. Taking
+the char-rnn model as an example, the test job can be configured as
+
+    test_steps: 10000
+    train_one_batch {
+      Alg: kBPTT
+    }
+    net {
+      // do not set the unroll_len
+      layer {
+        // do not set the unroll_len
+      }
+      …
+    }
+
+The instructions for [running test](test.html) is the same for feed-forward
+models.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/gpu.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/gpu.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/gpu.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/gpu.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,100 @@
+# Training on GPU
+
+---
+
+Considering GPU is much faster than CPU for linear algebra operations,
+it is essential to support the training of deep learning models (which involves
+a lot of linear algebra operations) on GPU cards.
+SINGA now supports training on a single node (i.e., process) with multiple GPU
+cards. Training in a GPU cluster with multiple nodes is under development.
+
+## Instructions
+
+### Compilation
+To enable the training on GPU, you need to compile SINGA with [CUDA](http://www.nvidia.com/object/cuda_home_new.html) from Nvidia,
+
+    ./configure --enable-cuda --with-cuda=<path to cuda folder>
+
+In addition, if you want to use the [CUDNN library](https://developer.nvidia.com/cudnn) for convolutional neural network
+provided by Nvidia, you need to enable CUDNN,
+
+
+    ./configure --enable-cuda --with-cuda=<path to cuda folder> --enable-cudnn --with-cudnn=<path to cudnn folder>
+
+SINGA now supports CUDNN V3.0.
+
+
+### Configuration
+
+The job configuration for GPU training is similar to that for training on CPU.
+There is one more field to configure, `gpu`, which indicate the device ID of
+the GPU you want to use. The simplest configuration is
+
+
+    # job.conf
+    ...
+    gpu: 0
+    ...
+
+This configuration will run the worker on GPU 0. If you want to launch multiple
+workers, each on a separate GPU, you can configure it as
+
+    # job.conf
+    ...
+    gpu: 0
+    gpu: 2
+    ...
+    cluster {
+      nworkers_per_group: 2
+      nworkers_per_process: 2
+    }
+
+Using the above configuration, SINGA would partition each mini-batch evenly
+onto two workers which run on GPU 0 and GPU 2 respectively. For more information
+on running multiple workers in a single node, please refer to
+[Training Framework](frameworks.html). Please be careful to configure the same number
+of workers and number of `gpu`s. Otherwise some workers would run on GPU and the
+rest would run on CPU. This kind of hybrid training is not well supported for now.
+
+
+For some layers, their implementation is transparent to GPU/CPU, like the InnerProductLayer
+GRULayer, ReLULayer, etc. Hence, you can use the same configuration for these layers to run
+on GPU or CPU. For other layers, especially the layers involved in ConvNet, SINGA
+uses different implementations for GPU and CPU. Particularly, the GPU version is
+implemented using CUDNN library. To train a ConvNet on GPU, you configure the layers as
+
+    layer {
+      type: kCudnnConv
+      ...
+    }
+    layer {
+      type: kCudnnPool
+      ...
+    }
+
+The [cifar10 example](cnn.html) and [Alexnet example](alexnet.html) have complete
+configurations for ConvNet.
+
+## Implementation details
+
+SINGA implements the GPU training by assigning each worker a GPU device at the beginning
+of training (by the Driver class). Then the work can call GPU functions and run them on the
+assigned GPU. GPU is typically used for linear algebra computation in layer
+functions, because GPU is good at such computation. There is a [Context]() singleton,
+which stores the handles and random generators for each device. The layer code
+should detect its running device and then call the CPU or GPU functions correspondingly.
+
+To make the layer implementation easier
+SINGA provides some linear algebra functions (in *math_blob.h*), which are transparent to the running
+device for users. Internally, they query the Context singleton to get the device information
+and call CPU or GPU to do the computation. Consequently, users can implement
+layers without awareness of the underlying running device.
+
+If the functionality cannot be implemented using SINGA provided functions in
+*math_blob.h*, the layer code needs to handle the CPU and GPU devices explicitly
+by querying the Context singleton.  For layers that cannot run on GPU, e.g.,
+input/output layers and connection layers which have little computation but much
+IO or network workload, there is no need to consider the GPU device.
+When these layers are configured in a neural net, they will run on CPU (since
+they don't call GPU functions).
+

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/hdfs.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/hdfs.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/hdfs.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/hdfs.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,128 @@
+# Using HDFS with SINGA
+
+This guide explains how to make use of HDFS as the data store for SINGA jobs. 
+
+1. [Quick start using Docker](#quickstart)
+2. [Setup HDFS](#hdfs)
+3. [Examples](#examples)    
+
+--
+<a name="quickstart"></a>
+## Quick start using Docker 
+
+We provide a Docker container built on top of `singa/mesos` (see the <a href="http://singa.incubator.apache.org/docs/docker.html">guide on building SINGA on Docker</a>). 
+
+```
+git clone https://github.com/ug93tad/incubator-singa
+cd incubator-singa
+git checkout SINGA-97-docker
+cd tool/docker/hdfs
+sudo docker build -t singa/hdfs .
+```
+
+Once built, the container image `singa/hdfs` contains the installation of HDFS C++ library (`libhdfs3`) and the latest SINGA code. Many distributed nodes can be launched, and HDFS be set up, by following the <a href="http://singa.incubator.apache.org/docs/mesos.html">guide for running distributed SINGA on Mesos</a>. 
+
+In the following, we assume the HDFS setup with `node0` being the namenode, and `nodei (i>0)` being the datanodes. 
+
+<a name="hdfs"></a>
+## Setup HDFS 
+There are at least 2 C/C++ client libraries for interacting with HDFS. One is from Hadoop (`libhdfs`), which is a <a href="https://wiki.apache.org/hadoop/LibHDFS">JNI-based library</a>, meaning that communication will go through JVM. The other is `libhdfs3` which is a <a href="https://github.com/PivotalRD/libhdfs3">native C++ library developed by Pivotal</a>, in which the client communicate directly with HDFS via RPC. The current implementation uses the second one. 
+
+1. Install `libhdfs3`: follow the <a href="https://github.com/PivotalRD/libhdfs3#installation">official guide</a>. 
+
+2. **Additional setup**: recent versions of Hadoop (>2.4.x) support short-circuit local reads which bypass network communications (TCP sockets) when retrieving data at the local nodes. `libhdfs3` will throws errors (but will still continue to work) when it finds that short-circuit read is not set. To deal with this complaints, and improve performance, add the following configuration to `hdfs-site.xml` **and to `hdfs-client.xml`**
+  
+    ```
+  <property>
+    <name>dfs.client.read.shortcircuit</name>
+    <value>true</value>
+  </property>
+  <property>
+    <name>dfs.domain.socket.path</name>
+    <value>/var/lib/hadoop-hdfs/dn_socket</value>
+  </property>
+    ``` 
+    Next, at each client, set `LIBHDFS3_CONF` variable to point to `hdfs-client.xml` file:
+
+    ```
+  export LIBHDFS3_CONF=$HADOOP_HOME/etc/hadoop/hdfs-client.xml
+    ```
+
+<a name="examples"></a>
+## Examples
+We explain how to run CIFAR10 and MNIST examples. Before training, the data must be uploaded to HDFS. 
+
+### CIFAR10
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+    * Change `job.conf` to use HDFS: in `examples/cifar10/job.conf`, set `backend` property to `hdfsfile`
+    * Create and upload data: 
+
+    ```
+    cd examples/cifar10
+    cp Makefile.example Makefile
+    make create
+    hadoop dfs -mkdir /examples/cifar10
+    hadoop dfs -copyFromLocal cifar-10-batches-bin /examples/cifar10/
+    ```
+    If successful, the files should be seen in HDFS via `hadoop dfs -ls /examples/cifar10`
+
+2. Training:
+    * Make sure `conf/singa.conf` has correct path to Zookeeper service: 
+
+    ```
+    zookeeper_host: "node0:2181"
+    ```
+
+    * Make sure `job.conf` has correct paths to the train and test datasets:
+
+    ```
+    // train layer
+    path: "hdfs://node0:9000/examples/cifar10/train_data.bin"
+    mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+    // test layer
+    path: "hdfs://node0:9000/examples/cifar10/test_data.bin"
+    mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+    ```
+
+    * Start training: execute the following command at every node
+
+    ```
+    ./singa -conf examples/cifar10/job.conf -singa_conf singa.conf -singa_job 0
+    ```
+
+### MNIST
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+    * Change `job.conf` to use HDFS: in `examples/mnist/job.conf`, set `backend` property to `hdfsfile`
+    * Create and upload data:
+
+    ```
+    cd examples/mnist
+    cp Makefile.example Makefile
+    make create
+    make compile
+    ./create_data.bin train-images-idx3-ubyte train-labels-idx1-ubyte hdfs://node0:9000/examples/mnist/train_data.bin
+    ./create_data.bin t10k-images-idx3-ubyte t10k-labels-idx1-ubyte hdfs://node0:9000/examples/mnist/test_data.bin
+    ```
+    If successful, the files should be seen in HDFS via `hadoop dfs -ls /examples/mnist`
+
+2. Training:
+    * Make sure `conf/singa.conf` has correct path to Zookeeper service: 
+
+    ```
+    zookeeper_host: "node0:2181"
+    ```
+
+    * Make sure `job.conf` has correct paths to the train and test datasets:
+
+    ```
+    // train layer
+    path: "hdfs://node0:9000/examples/mnist/train_data.bin"
+    // test layer
+    path: "hdfs://node0:9000/examples/mnist/test_data.bin"
+    ```
+
+    * Start training: execute the following command at every node
+
+    ```
+    ./singa -conf examples/mnist/job.conf -singa_conf singa.conf -singa_job 0
+    ```

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/hybrid.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/hybrid.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/hybrid.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/hybrid.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,83 @@
+# Hybrid Parallelism
+
+---
+
+## User Guide
+
+SINGA supports different parallelism options for distributed training.
+Users just need to configure it in the job configuration.
+
+Both `NetProto` and `LayerProto` have a field `partition_dim` to control the parallelism option:
+
+  * `partition_dim=0`: neuralnet/layer is partitioned on data dimension, i.e., each worker processes a subset of data records.
+  * `partition_dim=1`: neuralnet/layer is partitioned on feature dimension, i.e., each worker maintains a subset of feature parameters.
+
+`partition_dim` field in `NetProto` will be applied to all layers, unless a layer has its own `partition_dim` field set.
+
+If we want data parallelism for the whole model, just leave `partition_dim` as default (which is 0), or configure the job.conf like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: ... 
+    type: ...
+  }
+  ...
+}
+```
+
+With the hybrid parallelism, we can have layers either partitioned on data dimension or feature dimension.
+For example, if we want a specific layer partitioned on feature dimension, just configure like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: "layer1_partition_on_data_dimension"
+    type: ...
+  }
+  layer {
+    name: "layer2_partition_on_feature_dimension"
+    type: ...
+    partition_dim: 1
+  }
+  ...
+}
+```
+
+## Developer Guide
+
+To support hybrid parallelism, after singa read users' model and paration configuration, a set of connection layers are automatically added between layers when needed:
+
+* `BridgeSrcLayer` & `BridgeDstLayer` are added when two connected layers are not in the same machine. They are paired and are responsible for sending data/gradient to the other side during each iteration.
+
+* `ConcateLayer` is added when there are multiple source layers. It combines their feature blobs along a given dimension.
+
+* `SliceLayer` is added when there are mutliple dest layers, each of which only needs a subset(slice) of this layers' feature blob.
+
+* `SplitLayer` is added when there are multiple dest layers, each of which needs the whole feature blob.
+
+Following is the logic used in our code to add connection layers:
+
+```
+Add Slice, Concate, Split Layers for Hybrid Partition
+
+All cases are as follows:
+src_pdim | dst_pdim | connection_type | Action
+    0    |     0    |     OneToOne    | Direct Connection
+    1    |     1    |     OneToOne    | Direct Connection
+    0    |     0    |     OneToAll    | Direct Connection
+    1    |     0    |     OneToOne    | Slice -> Concate
+    0    |     1    |     OneToOne    | Slice -> Concate
+    1    |     0    |     OneToAll    | Slice -> Concate
+    0    |     1    |     OneToAll    | Split -> Concate
+    1    |     1    |     OneToAll    | Split -> Concate
+
+Logic:
+dst_pdim = 1 && OneToAll ?
+  (YES) Split -> Concate
+  (NO)  src_pdim = dst_pdim ?
+          (YES) Direct Connection
+          (NO)  Slice -> Concate
+```

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/index.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/index.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/index.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/index.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,26 @@
+# Latest Documentation
+
+---
+
+* [Introduction](overview.html)
+* [Installation](installation.html)
+* [Quick Start](quick-start.html)
+* [Programming Guide](programming-guide.html)
+    * [NeuralNet](neural-net.html)
+        * [Layer](layer.html)
+        * [Param](param.html)
+    * [TrainOneBatch](train-one-batch.html)
+    * [Updater](updater.html)
+* [Distributed Training](distributed-training.html)
+* [Data Preparation](data.html)
+* [Checkpoint and Resume](checkpoint.html)
+* [Python Binding](python.html)
+* [Performance test and Feature extraction](test.html)
+* [Training on GPU](gpu.html)
+* [Examples](examples.html)
+    * Feed-forward models
+        * [CNN](cnn.html)
+        * [MLP](mlp.html)
+    * [RBM + Auto-encoder](rbm.html)
+    * [Vanilla RNN for language modelling](rnn.html)
+    * [Char-RNN](general-rnn.html)

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/installation.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/installation.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/installation.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/installation.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,9 @@
+# Installation 
+
+---
+
+Currently, there are two ways to install SINGA: build directly from source, and build a Docker image. 
+
+* [Build SINGA directly from source](installation_source.html) 
+* [Build SINGA as a Docker container](docker.html)
+

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/installation_source.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/installation_source.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/installation_source.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/installation_source.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,260 @@
+# Building SINGA from source
+
+---
+
+## Dependencies
+
+SINGA is developed and tested on Linux platforms.
+
+The following dependent libraries are required:
+
+  * glog version 0.3.3
+
+  * google-protobuf version 2.6.0
+
+  * openblas version >= 0.2.10
+
+  * zeromq version >= 3.2
+
+  * czmq version >= 3
+
+  * zookeeper version 3.4.6
+
+
+Optional dependencies include:
+
+  * lmdb version 0.9.10
+
+
+You can install all dependencies into $PREFIX folder by
+
+    # make sure you are in the thirdparty folder
+    cd thirdparty
+    ./install.sh all $PREFIX
+
+If $PREFIX is not a system path (e.g., /usr/local/), please export the following
+variables to continue the building instructions,
+
+    export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
+    export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+    export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+    export PATH=$PREFIX/bin:$PATH
+
+More details on using this script is given below.
+
+## Building SINGA from source
+
+SINGA is built using GNU autotools. GCC (version >= 4.8) is required.
+There are two ways to build SINGA,
+
+  * If you want to use the latest code, please clone it from
+  [Github](https://github.com/apache/incubator-singa.git) and execute
+  the following commands,
+
+        $ git clone git@github.com:apache/incubator-singa.git
+        $ cd incubator-singa
+        $ ./autogen.sh
+        $ ./configure
+        $ make
+
+  Note: It is an oversight that we forgot to delete the singa repo under [nusinga](https://github.com/orgs/nusinga)
+  account after we became Apache Incubator project -- the source
+  in that repo was not up to date, and we apologize for any inconvenience.
+
+  * If you download a release package, please follow the instructions below,
+
+        $ tar xvf singa-xxx
+        $ cd singa-xxx
+        $ ./configure
+        $ make
+
+    Some features of SINGA depend on external libraries. These features can be
+    compiled with `--enable-<feature>`.
+    For example, to build SINGA with lmdb support, you can run:
+
+        $ ./configure --enable-lmdb
+
+<!---
+Zhongle: please update the code to use the follow command
+
+    $ make test
+
+After compilation, you will find the binary file singatest. Just run it!
+More details about configure script can be found by running:
+
+		$ ./configure -h
+-->
+
+After compiling SINGA successfully, the *libsinga.so* and the executable file
+*singa* will be generated into *.libs/* folder.
+
+If some dependent libraries are missing (or not detected), you can use the
+following script to download and install them:
+
+<!---
+to be updated after zhongle changes the code to use
+
+    ./install.sh libname \-\-prefix=
+
+-->
+    # must goto thirdparty folder
+    $ cd thirdparty
+    $ ./install.sh LIB_NAME PREFIX
+
+If you do not specify the installation path, the library will be installed in
+the default folder specified by the software itself.  For example, if you want
+to install `zeromq` library in the default system folder, run it as
+
+    $ ./install.sh zeromq
+
+Or, if you want to install it into another folder,
+
+    $ ./install.sh zeromq PREFIX
+
+You can also install all dependencies in */usr/local* directory:
+
+    $ ./install.sh all /usr/local
+
+Here is a table showing the first arguments:
+
+    LIB_NAME  LIBRARIE
+    czmq*                 czmq lib
+    glog                  glog lib
+    lmdb                  lmdb lib
+    OpenBLAS              OpenBLAS lib
+    protobuf              Google protobuf
+    zeromq                zeromq lib
+    zookeeper             Apache zookeeper
+
+*: Since `czmq` depends on `zeromq`, the script offers you one more argument to
+indicate `zeromq` location.
+The installation commands of `czmq` is:
+
+<!---
+to be updated to
+
+    $./install.sh czmq  \-\-prefix=/usr/local \-\-zeromq=/usr/local/zeromq
+-->
+
+    $./install.sh czmq  /usr/local -f=/usr/local/zeromq
+
+After the execution, `czmq` will be installed in */usr/local*. The last path
+specifies the path to zeromq.
+
+### FAQ
+* Q1:I get error `./configure --> cannot find blas_segmm() function` even I
+have installed OpenBLAS.
+
+  A1: This means the compiler cannot find the `OpenBLAS` library. If you installed
+  it to $PREFIX (e.g., /opt/OpenBLAS), then you need to export it as
+
+      $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+      # e.g.,
+      $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
+
+
+* Q2: I get error `cblas.h no such file or directory exists`.
+
+  Q2: You need to include the folder of the cblas.h into CPLUS_INCLUDE_PATH,
+  e.g.,
+
+      $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+      # e.g.,
+      $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
+      # then reconfigure and make SINGA
+      $ ./configure
+      $ make
+
+
+* Q3:While compiling SINGA, I get error `SSE2 instruction set not enabled`
+
+  A3:You can try following command:
+
+      $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+
+
+* Q4:I get `ImportError: cannot import name enum_type_wrapper` from
+google.protobuf.internal when I try to import .py files.
+
+  A4:After install google protobuf by `make install`, we should install python
+  runtime libraries. Go to protobuf source directory, run:
+
+      $ cd /PROTOBUF/SOURCE/FOLDER
+      $ cd python
+      $ python setup.py build
+      $ python setup.py install
+
+  You may need `sudo` when you try to install python runtime libraries in
+  the system folder.
+
+
+* Q5: I get a linking error caused by gflags.
+
+  A5: SINGA does not depend on gflags. But you may have installed the glog with
+  gflags. In that case you can reinstall glog using *thirdparty/install.sh* into
+  a another folder and export the LDFLAGS and CPPFLAGS to include that folder.
+
+
+* Q6: While compiling SINGA and installing `glog` on mac OS X, I get fatal error
+`'ext/slist' file not found`
+
+  A6:Please install `glog` individually and try :
+
+      $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+
+* Q7: When I start a training job, it reports error related with "ZOO_ERROR...zk retcode=-4...".
+
+  A7: This is because the zookeeper is not started. Please start the zookeeper service
+
+      $ ./bin/zk-service start
+
+  If the error still exists, probably that you do not have java. You can simple
+  check it by
+
+      $ java --version
+
+* Q8: When I build OpenBLAS from source, I am told that I need a fortran compiler.
+
+  A8: You can compile OpenBLAS by
+
+      $ make ONLY_CBLAS=1
+
+  or install it using
+
+	    $ sudo apt-get install openblas-dev
+
+  or
+
+	    $ sudo yum install openblas-devel
+
+  It is worth noting that you need root access to run the last two commands.
+  Remember to set the environment variables to include the header and library
+  paths of OpenBLAS after installation (please refer to the Dependencies section).
+
+* Q9: When I build protocol buffer, it reports that GLIBC++_3.4.20 not found in /usr/lib64/libstdc++.so.6.
+
+  A9: This means the linker found libstdc++.so.6 but that library
+  belongs to an older version of GCC than was used to compile and link the
+  program. The program depends on code defined in
+  the newer libstdc++ that belongs to the newer version of GCC, so the linker
+  must be told how to find the newer libstdc++ shared library.
+  The simplest way to fix this is to find the correct libstdc++ and export it to
+  LD_LIBRARY_PATH. For example, if GLIBC++_3.4.20 is listed in the output of the
+  following command,
+
+      $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
+
+  then you just set your environment variable as
+
+      $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH
+* Q10: When I build glog, it reports that "src/logging_unittest.cc:83:20: error: ‘gflags’ is not a namespace-name"
+
+  A10: It maybe that you have installed gflags with a different namespace such as "google". so glog can't find 'gflags' namespace.
+  
+  Because it doesn't require gflags to build glog. So you can change the configure.ac file to ignore gflags.
+
+  1. cd to glog src directory
+  2. change line 125 of configure.ac  to "AC_CHECK_LIB(gflags, main, ac_cv_have_libgflags=0, ac_cv_have_libgflags=0)"
+  3. autoreconf 
+ 
+  After this, you can build glog again.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/architecture.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/architecture.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/architecture.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/architecture.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,42 @@
+# SINGA アーキテクチャー
+
+---
+
+## ロジカル アーキテクチャー
+
+<img src="../../images/logical.png" style="width: 550px"/>
+<p><strong> Fig.1 - システム アーキテクチャー</strong></p>
+
+SINGAは、多様な分散[トレーニング フレームワーク](frameworks.html) (同期、または非同期トレーニング).
+をサポートするための柔軟な構造をもっています。
+Fig.1. にシステムの構造を示します。
+特徴としては、複数の server グループと、worker グループを持つことです。
+
+* **Server グループ**
+
+  Server グループは、モデルパラメーターのレプリカを持ち、worker グループからのリクエストに従いパラメーターのアップデートを担当します。隣接した server グループ達は、パラメーターを定期的に同期させます。通常1つの server グループは複数の server で構成され、各 server はモデルパラメーターの分割された一部を担当します。
+
+* **Worker グループ**
+
+  各 worker グループは、1つの server グループと通信します。1つの worker グループは、パラメーターの勾配計算を担当します。また分割されたデータの一部に対し“完全な”モデルレプリカをトレーニングします。すべての worker グループ達は、対応する server グループ達と非同期的に通信します。しかし、同じ worker グループ内の worker 達は、同期します。
+
+同一グループ内での worker 達の分散トレーニングには、たくさんの異なった方法があります。
+
+  * **モデル 並列化**: 各 worker は、グループに振り分けられたすべてのデータに対して、パラメーターのサブセットを計算します。
+  * **データ 並列化**: 各 worker は、振り分けられたデータのサブセットに対して、すべてのパラメーターを計算します。
+  * **ハイブリッド 並列化**: SINGAは、上記の方法を組み合わせたハイブリッドな並列化もサポートします。
+
+
+## 実装
+
+SINGAでの servers と workers は、別スレッドで動く実行ユニットです。
+
+In SINGA, servers and workers are execution units running in separate threads.
+それらは [messages](communication.html) を利用して通信します。
+各プロセスは、ローカル messages を集め、それを対応するレシーバーに転送する stub としてメインスレッドを実行します。
+
+各 server グループと worker グループは、“完全な”モデルレプリカである *ParamShard* オブジェクトを保持します。
+もし workers と servers が同じプロセスで走るなら、
+それらの *ParamShard* (パーティション)は、同じメモリスペースを共有するよう設定されます。
+この場合、異なる実行ユニット間を行き来する messages は通信コストを抑えるためにデータのポインタだけを含みます。
+プロセス間通信の場合とは異なり、messsages はパラメーターの値を含みます。

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/checkpoint.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/checkpoint.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/checkpoint.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/checkpoint.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,70 @@
+# CheckPoint
+
+---
+
+SINGA checkpoints model parameters onto disk periodically according to user
+configured frequency. By checkpointing model parameters, we can
+
+  1. resume the training from the last checkpointing. For example, if
+    the program crashes before finishing all training steps, we can continue
+    the training using checkpoint files.
+
+  2. use them to initialize a similar model. For example, the
+    parameters from training a RBM model can be used to initialize
+    a [deep auto-encoder](rbm.html) model.
+
+## Configuration
+
+Checkpointing is controlled by two configuration fields:
+
+* `checkpoint_after`, start checkpointing after this number of training steps,
+* `checkpoint_freq`, frequency of doing checkpointing.
+
+For example,
+
+    # job.conf
+    checkpoint_after: 100
+    checkpoint_frequency: 300
+    ...
+
+Checkpointing files are located at *WORKSPACE/checkpoint/stepSTEP-workerWORKERID*.
+*WORKSPACE* is configured in
+
+    cluster {
+      workspace:
+    }
+
+For the above configuration, after training for 700 steps, there would be
+two checkpointing files,
+
+    step400-worker0
+    step700-worker0
+
+## Application - resuming training
+
+We can resume the training from the last checkpoint (i.e., step 700) by,
+
+    ./bin/singa-run.sh -conf JOB_CONF -resume
+
+There is no change to the job configuration.
+
+## Application - model initialization
+
+We can also use the checkpointing file from step 400 to initialize
+a new model by configuring the new job as,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    ...
+
+If there are multiple checkpointing files for the same snapshot due to model
+partitioning, all the checkpointing files should be added,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    checkpoint : "WORKSPACE/checkpoint/step400-worker1"
+    ...
+
+The training command is the same as starting a new job,
+
+    ./bin/singa-run.sh -conf JOB_CONF

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/cnn.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/cnn.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/cnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/cnn.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,217 @@
+# CNN Example
+
+---
+
+Convolutional neural network (CNN) is a type of feed-forward artificial neural
+network widely used for image and video classification. In this example, we will
+use a deep CNN model to do image classification for the
+[CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).
+
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in *examples/cifar10/*.
+
+    # in examples/cifar10
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+
+We can start the training by
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+You should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
+    Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+    E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 33849)
+    E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+    E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, accuracy : 0.077900
+    E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, accuracy : 0.062500
+    E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, accuracy : 0.131250
+    E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, accuracy : 0.156250
+    E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, accuracy : 0.175000
+    E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, accuracy : 0.137500
+    E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, accuracy : 0.135417
+    E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, accuracy : 0.127083
+    E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, accuracy : 0.154167
+    E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, accuracy : 0.185417
+
+After training some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to convert the dataset
+into a format that SINGA can read. Please refer to the
+[Data Preparation](data.html#example---cifar-dataset) to get details about
+preparing this CIFAR10 dataset.
+
+### Neural net
+
+Figure 1 shows the net structure of the CNN model we used in this example, which is
+set following [Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
+The dashed circle represents one feature transformation stage, which generally
+has four layers as shown in the figure. Sometimes the rectifier layer and normalization layer
+are omitted or swapped in one stage. For this example, there are 3 such stages.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+<div style = "text-align: center">
+<img src = "../images/example-cnn.png" style = "width: 200px"> <br/>
+<strong>Figure 1 - Net structure of the CNN example.</strong></img>
+</div>
+
+* We configure an input layer to read the training/testing records from a disk file.
+
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/train_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 64
+            random_skip: 5000
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+           exclude: kTest  # exclude this layer for the testing net
+        }
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/test_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 100
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+         exclude: kTrain # exclude this layer for the training net
+        }
+
+
+* We configure layers for the feature transformation as follows
+(all layers are built-in layers in SINGA; hyper-parameters of these layers are set according to
+[Alex's setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).
+
+        layer {
+          name: "conv1"
+          type: kConvolution
+          srclayers: "data"
+          convolution_conf {... }
+          ...
+        }
+        layer {
+          name: "pool1"
+          type: kPooling
+          srclayers: "conv1"
+          pooling_conf {... }
+        }
+        layer {
+          name: "relu1"
+          type: kReLU
+          srclayers:"pool1"
+        }
+        layer {
+          name: "norm1"
+          type: kLRN
+          lrn_conf {... }
+          srclayers:"relu1"
+        }
+
+  The configurations for another 2 stages are omitted here.
+
+* There is an [inner product layer](layer.html#innerproductlayer)
+after the 3 transformation stages, which is
+configured with 10 output units, i.e., the number of total labels. The weight
+matrix Param is configured with a large weight decay scale to reduce the over-fitting.
+
+        layer {
+          name: "ip1"
+          type: kInnerProduct
+          srclayers:"pool3"
+          innerproduct_conf {
+            num_output: 10
+          }
+          param {
+            name: "w4"
+            wd_scale:250
+            ...
+          }
+          param {
+            name: "b4"
+            ...
+          }
+        }
+
+* The last layer is a [Softmax loss layer](layer.html#softmaxloss)
+
+        layer{
+          name: "loss"
+          type: kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"ip1"
+          srclayers: "data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate is changed like going down stairs, and is configured using the
+[kFixedStep](updater.html#kfixedstep) type.
+
+        updater{
+          type: kSGD
+          weight_decay:0.004
+          learning_rate {
+            type: kFixedStep
+            fixedstep_conf:{
+              step:0             # lr for step 0-60000 is 0.001
+              step:60000         # lr for step 60000-65000 is 0.0001
+              step:65000         # lr for step 650000- is 0.00001
+              step_lr:0.001
+              step_lr:0.0001
+              step_lr:0.00001
+            }
+          }
+        }
+
+### TrainOneBatch algorithm
+
+The CNN model is a feed forward model, thus should be configured to use the
+[Back-propagation algorithm](train-one-batch.html#back-propagation).
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/code-structure.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/code-structure.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/code-structure.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/code-structure.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,76 @@
+# Code Structure
+
+---
+
+<!--
+
+### Worker Side
+
+#### Main Classes
+
+<img src="../images/code-structure/main.jpg" style="width: 550px"/>
+
+* **Worker**: start the solver to conduct training or resume from previous training snapshots.
+* **Solver**: construct the neural network and run training algorithms over it. Validation and testing is also done by the solver along the training.
+* **TableDelegate**: delegate for the parameter table physically stored in parameter servers.
+    it runs a thread to communicate with table servers for parameter transferring.
+* **Net**: the neural network consists of multiple layers constructed from input configuration file.
+* **Layer**: the core abstraction, read data (neurons) from connecting layers, and compute the data
+    of itself according to layer specific ComputeFeature functions. Data from the bottom layer is forwarded
+    layer by layer to the top.
+
+#### Data types
+
+<img src="../images/code-structure/layer.jpg" style="width: 700px"/>
+
+* **ComputeFeature**: read data (neurons) from in-coming layers, and compute the data
+    of itself according to layer type. This function can be overrided to implement different
+    types layers.
+* **ComputeGradient**: read gradients (and data) from in-coming layers and compute
+    gradients of parameters and data w.r.t the learning objective (loss).
+
+We adpat the implementation for **PoolingLayer**, **Im2colLayer** and **LRNLayer** from [Caffe](http://caffe.berkeleyvision.org/).
+
+
+<img src="../images/code-structure/darray.jpg" style="width: 400px"/>
+
+* **DArray**: provide the abstraction of distributed array on multiple nodes,
+    supporting array/matrix operations and element-wise operations. Users can use it as a local structure.
+* **LArray**: the local part for the DArray. Each LArray is treated as an
+    independent array, and support all array-related operations.
+* **MemSpace**: manage the memory used by DArray. Distributed memory are allocated
+    and managed by armci. Multiple DArray can share a same MemSpace, the memory
+    will be released when no DArray uses it anymore.
+* **Partition**: maintain both global shape and local partition information.
+    used when two DArray are going to interact.
+* **Shape**: basic class for representing the scope of a DArray/LArray
+* **Range**: basic class for representing the scope of a Partition
+
+### Parameter Server
+
+#### Main classes
+
+<img src="../images/code-structure/uml.jpg" style="width: 750px"/>
+
+* **NetworkService**: provide access to the network (sending and receiving messages). It maintains a queue for received messages, implemented by NetworkQueue.
+* **RequestDispatcher**: pick up next message (request) from the queue, and invoked a method (callback) to process them.
+* **TableServer**: provide access to the data table (parameters). Register callbacks for different types of requests to RequestDispatcher.
+* **GlobalTable**: implement the table. Data is partitioned into multiple Shard objects per table. User-defined consistency model supported by extending TableServerHandler for each table.
+
+#### Data types
+
+<img src="../images/code-structure/type.jpg" style="width: 400px"/>
+
+Table related messages are either of type **RequestBase** which contains different types of request, or of type **TableData** containing a key-value tuple.
+
+#### Control flow and thread model
+
+<img src="../images/code-structure/threads.jpg" alt="uml" style="width: 1000px"/>
+
+The figure above shows how a GET request sent from a worker is processed by the
+table server. The control flow for other types of requests is similar. At
+the server side, there are at least 3 threads running at any time: two by
+NetworkService for sending and receiving message, and at least one by the
+RequestDispatcher for dispatching requests.
+
+-->

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/communication.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/communication.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/communication.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/communication.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,453 @@
+# Communication
+
+---
+
+Different messaging libraries has different benefits and drawbacks. For instance,
+MPI provides fast message passing between GPUs (using GPUDirect), but does not
+support fault-tolerance well. On the contrary, systems using ZeroMQ can be
+fault-tolerant, but does not support GPUDirect. The AllReduce function
+of MPI is also missing in ZeroMQ which is efficient for data aggregation for
+distributed training. In Singa, we provide general messaging APIs for
+communication between threads within a process and across processes, and let
+users choose the underlying implementation (MPI or ZeroMQ) that meets their requirements.
+
+Singa's messaging library consists of two components, namely the message, and
+the socket to send and receive messages. **Socket** refers to a
+Singa defined data structure instead of the Linux Socket.
+We will introduce the two components in detail with the following figure as an
+example architecture.
+
+<img src="../images/arch/arch2.png" style="width: 550px"/>
+<img src="../images/arch/comm.png" style="width: 550px"/>
+<p><strong> Fig.1 - Example physical architecture and network connection</strong></p>
+
+Fig.1 shows an example physical architecture and its network connection.
+[Section-partition server side ParamShard](architecture.html}) has a detailed description of the
+architecture. Each process consists of one main thread running the stub and multiple
+background threads running the worker and server tasks. The stub of the main
+thread forwards messages among threads . The worker and
+server tasks are performed by the background threads.
+
+## Message
+
+<object type="image/svg+xml" style="width: 100px" data="../images/msg.svg" > Not
+supported </object>
+<p><strong> Fig.2 - Logical message format</strong></p>
+
+Fig.2 shows the logical message format which has two parts, the header and the
+content. The message header includes the sender's and receiver's IDs, each consisting of
+the group ID and the worker/server ID within the group. The stub forwards
+messages by looking up an address table based on the receiver's ID.
+There are two sets of messages according to the message type defined below.
+
+  * kGet/kPut/kRequest/kSync for messages about parameters
+
+  * kFeaBlob/kGradBlob for messages about transferring feature and gradient
+  blobs of one layer to its neighboring layer
+
+There is a target ID in the header. If the message body is parameters,
+the target ID is then the parameter ID. Otherwise the message is related to
+layer feature or gradient, and the target ID consists of the layer ID and the
+blob ID of that layer. The message content has multiple frames to store the
+parameter or feature data.
+
+The API for the base Msg is:
+
+    /**
+     * Msg used to transfer Param info (gradient or value), feature blob, etc
+     * between workers, stubs and servers.
+     *
+     * Each msg has a source addr and dest addr identified by a unique integer.
+     * It is also associated with a target field (value and version) for ease of
+     * getting some meta info (e.g., parameter id) from the msg.
+     *
+     * Other data is added into the message as frames.
+     */
+    class Msg {
+     public:
+      ~Msg();
+      Msg();
+      /**
+       * Construct the msg providing source and destination addr.
+       */
+      Msg(int src, int dst);
+      /**
+       * Copy constructor.
+       */
+      Msg(const Msg& msg);
+      /**
+       * Swap the src/dst addr
+       */
+      void SwapAddr();
+      /**
+       * Add a frame (a chunk of bytes) into the message
+       */
+      void AddFrame(const void* addr, int nBytes);
+      /**
+       * @return num of bytes of the current frame.
+       */
+      int FrameSize();
+      /**
+       * @return the pointer to the current frame data.
+       */
+      void* FrameData();
+      /**
+       * @return the data of the current frame as c string
+       */
+      char* FrameStr();
+      /**
+       * Move the cursor to the first frame.
+       */
+      void FirstFrame();
+      /**
+       * Move the cursor to the last frame.
+       */
+      void LastFrame();
+      /**
+       * Move the cursor to the next frame
+       * @return true if the next frame is not NULL; otherwise false
+       */
+      bool NextFrame();
+      /**
+       *  Add a 'format' frame to the msg (like CZMQ's zsock_send).
+       *
+       *  The format is a string that defines the type of each field.
+       *  The format can contain any of these characters, each corresponding to
+       *  one or two arguments:
+       *  i = int (signed)
+       *  1 = uint8_t
+       *  2 = uint16_t
+       *  4 = uint32_t
+       *  8 = uint64_t
+       *  p = void * (sends the pointer value, only meaningful over inproc)
+       *  s = char**
+       *
+       *  Returns size of the added content.
+       */
+      int AddFormatFrame(const char *format, ...);
+      /**
+       *  Parse the current frame added using AddFormatFrame(const char*, ...).
+       *
+       *  The format is a string that defines the type of each field.
+       *  The format can contain any of these characters, each corresponding to
+       *  one or two arguments:
+       *  i = int (signed)
+       *  1 = uint8_t
+       *  2 = uint16_t
+       *  4 = uint32_t
+       *  8 = uint64_t
+       *  p = void * (sends the pointer value, only meaningful over inproc)
+       *  s = char**
+       *
+       *  Returns size of the parsed content.
+       */
+      int ParseFormatFrame(const char* format, ...);
+
+    #ifdef USE_ZMQ
+      void ParseFromZmsg(zmsg_t* msg);
+      zmsg_t* DumpToZmsg();
+    #endif
+
+      /**
+       * @return msg size in terms of bytes, ignore meta info.
+       */
+      int size() const;
+      /**
+       * Set source addr.
+       * @param addr unique identify one worker/server/stub in the current job
+       */
+      void set_src(int addr) { src_ = addr; }
+      /**
+       * @return source addr.
+       */
+      int src() const { return src_; }
+      /**
+       * Set destination addr.
+       * @param addr unique identify one worker/server/stub in the current job
+       */
+      void set_dst(int addr) { dst_ = addr; }
+      /**
+       * @return dst addr.
+       */
+      int dst() const { return dst_; }
+      /**
+       * Set msg type, e.g., kPut, kGet, kUpdate, kRequest
+       */
+      void set_type(int type) { type_ = type; }
+      /**
+       * @return msg type.
+       */
+      int type() const { return type_; }
+      /**
+       * Set msg target.
+       *
+       * One msg has a target to identify some entity in worker/server/stub.
+       * The target is associated with a version, e.g., Param version.
+       */
+      void set_trgt(int val, int version) {
+        trgt_val_ = val;
+        trgt_version_ = version;
+      }
+      int trgt_val() const {
+        return trgt_val_;
+      }
+      int trgt_version() const {
+        return trgt_version_;
+      }
+
+    };
+
+In order for a Msg object to be routed, the source and dest address should be attached.
+This is achieved by calling the set_src and set_dst methods of the Msg object.
+The address parameter passed to these two methods can be manipulated via a set of
+helper functions, shown as below.
+
+    /**
+     * Wrapper to generate message address
+     * @param grp worker/server group id
+     * @param id_or_proc worker/server id or procs id
+     * @param type msg type
+     */
+    inline int Addr(int grp, int id_or_proc, int type) {
+      return (grp << 16) | (id_or_proc << 8) | type;
+    }
+
+    /**
+     * Parse group id from addr.
+     *
+     * @return group id
+     */
+    inline int AddrGrp(int addr) {
+      return addr >> 16;
+    }
+    /**
+     * Parse worker/server id from addr.
+     *
+     * @return id
+     */
+    inline int AddrID(int addr) {
+      static const int mask = (1 << 8) - 1;
+      return (addr >> 8) & mask;
+    }
+
+    /**
+     * Parse worker/server procs from addr.
+     *
+     * @return procs id
+     */
+    inline int AddrProc(int addr) {
+      return AddrID(addr);
+    }
+    /**
+     * Parse msg type from addr
+     * @return msg type
+     */
+    inline int AddrType(int addr) {
+      static const int mask = (1 << 8) -1;
+      return addr & mask;
+    }
+
+
+## Socket
+
+In SINGA, there are two types of sockets, the Dealer Socket and the Router
+Socket, whose names are adapted from ZeroMQ. All connections are of the same type, i.e.,
+Dealer<-->Router. The communication between dealers and routers are
+asynchronous. In other words, one Dealer
+socket can talk with multiple Router sockets, and one Router socket can talk
+with multiple Dealer sockets.
+
+### Base Socket
+
+The basic functions of a Singa Socket is to send and receive messages. The APIs
+are:
+
+    class SocketInterface {
+     public:
+      virtual ~SocketInterface() {}
+      /**
+        * Send a message to connected socket(s), non-blocking. The message
+        * will be deallocated after sending, thus should not be used after
+        * calling Send();
+        *
+        * @param msg The message to be sent
+        * @return 1 for success queuing the message for sending, 0 for failure
+        */
+      virtual int Send(Msg** msg) = 0;
+      /**
+        * Receive a message from any connected socket.
+        *
+        * @return a message pointer if success; nullptr if failure
+        */
+      virtual Msg* Receive() = 0;
+      /**
+       * @return Identifier of the implementation dependent socket. E.g., zsock_t*
+       * for ZeroMQ implementation and rank for MPI implementation.
+       */
+      virtual void* InternalID() const = 0;
+    };
+
+A poller class is provided to enable asynchronous communication between routers and dealers.
+One can register a set of SocketInterface objects with a poller instance via calling its Add method, and
+then call the Wait method of this poll object to wait for the registered SocketInterface objects to be ready
+for sending and receiving messages. The APIs of the poller class is shown below.
+
+    class Poller {
+     public:
+      Poller();
+      Poller(SocketInterface* socket);
+      /**
+        * Add a socket for polling; Multiple sockets can be polled together by
+        * adding them into the same poller.
+        */
+      void Add(SocketInterface* socket);
+      /**
+        * Poll for all sockets added into this poller.
+        * @param timeout Stop after this number of mseconds
+        * @return pointer To the socket if it has one message in the receiving
+        * queue; nullptr if no message in any sockets,
+        */
+      SocketInterface* Wait(int duration);
+
+      /**
+       * @return true if the poller is terminated due to process interupt
+       */
+      virtual bool Terminated();
+    };
+
+
+### Dealer Socket
+
+The Dealer socket inherits from the base Socket. In Singa, every Dealer socket
+only connects to one Router socket as shown in Fig.1.  The connection is set up
+by connecting the Dealer socket to the endpoint of a Router socket.
+
+    class Dealer : public SocketInterface {
+     public:
+      /*
+       * @param id Local dealer ID within a procs if the dealer is from worker or
+       * server thread, starts from 1 (0 is used by the router); or the connected
+       * remote procs ID for inter-process dealers from the stub thread.
+       */
+      Dealer();
+      explicit Dealer(int id);
+      ~Dealer() override;
+      /**
+        * Setup the connection with the router.
+        *
+        * @param endpoint Identifier of the router. For intra-process
+        * connection, the endpoint follows the format of ZeroMQ, i.e.,
+        * starting with "inproc://"; in Singa, since each process has one
+        * router, hence we can fix the endpoint to be "inproc://router" for
+        * intra-process. For inter-process, the endpoint follows ZeroMQ's
+        * format, i.e., IP:port, where IP is the connected process.
+        * @return 1 connection sets up successfully; 0 otherwise
+        */
+      int Connect(const std::string& endpoint);
+      int Send(Msg** msg) override;
+      Msg* Receive() override;
+      void* InternalID() const override;
+    };
+
+### Router Socket
+
+The Router socket inherits from the base Socket. One Router socket connects to
+at least one Dealer socket. Upon receiving a message, the router forwards it to
+the appropriate dealer according to the receiver's ID of this message.
+
+    class Router : public SocketInterface {
+     public:
+      Router();
+      /**
+       * There is only one router per procs, hence its local id is 0 and is not set
+       * explicitly.
+       *
+       * @param bufsize Buffer at most this number of messages
+       */
+      explicit Router(int bufsize);
+      ~Router() override;
+      /**
+       * Setup the connection with dealers.
+       *
+       * It automatically binds to the endpoint for intra-process communication,
+       * i.e., "inproc://router".
+       *
+       * @param endpoint The identifier for the Dealer socket in other process
+       * to connect. It has the format IP:Port, where IP is the host machine.
+       * If endpoint is empty, it means that all connections are
+       * intra-process connection.
+       * @return number of connected dealers.
+       */
+      int Bind(const std::string& endpoint);
+      /**
+       * If the destination socket has not connected yet, buffer this the message.
+       */
+      int Send(Msg** msg) override;
+      Msg* Receive() override;
+      void* InternalID() const override;
+
+    };
+
+## Implementation
+
+### ZeroMQ
+
+**Why [ZeroMQ](http://zeromq.org/)?** Our previous design used MPI for
+communication between Singa processes. But MPI is a poor choice when it comes
+to fault-tolerance, because failure at one node brings down the entire MPI
+cluster. ZeroMQ, on the other hand, is fault tolerant in the sense that one
+node failure does not affect the other nodes. ZeroMQ consists of several basic
+communication patterns that can be easily combined to create more complex
+network topologies.
+
+<img src="../images/msg-flow.png" style="width: 550px"/>
+<p><strong> Fig.3 - Messages flow for ZeroMQ</strong></p>
+
+The communication APIs of Singa are similar to the DEALER-ROUTER pattern of
+ZeroMQ. Hence we can easily implement the Dealer socket using ZeroMQ's DEALER
+socket, and Router socket using ZeroMQ's ROUTER socket.
+The intra-process can be implemented using ZeroMQ's inproc transport, and the
+inter-process can be implemented using the tcp transport (To exploit the
+Infiniband, we can use the sdp transport). Fig.3 shows the message flow using
+ZeroMQ as the underlying implementation. The messages sent from dealers has two
+frames for the message header, and one or more frames for the message content.
+The messages sent from routers have another frame for the identifier of the
+destination dealer.
+
+Besides the DEALER-ROUTER pattern, we may also implement the Dealer socket and
+Router socket using other ZeroMQ patterns. To be continued.
+
+### MPI
+
+Since MPI does not provide intra-process communication, we have to implement
+it inside the Router and Dealer socket. A simple solution is to allocate one
+message queue for each socket. Messages sent to one socket is inserted into the
+queue of that socket. We create a SafeQueue class to ensure the consistency of
+the queue. All queues are created by the main thread and
+passed to all sockets' constructor via *args*.
+
+    /**
+     * A thread safe queue class.
+     * There would be multiple threads pushing messages into
+     * the queue and only one thread reading and popping the queue.
+     */
+    class SafeQueue{
+     public:
+      void Push(Msg* msg);
+      Msg* Front();
+      void Pop();
+      bool empty();
+    };
+
+For inter-process communication, we serialize the message and call MPI's
+send/receive functions to transfer them. All inter-process connections are
+setup by MPI at the beginning. Consequently, the Connect and Bind functions do
+nothing for both inter-process and intra-process communication.
+
+MPI's AllReduce function is efficient for data aggregation in distributed
+training. For example, [DeepImage of Baidu](http://arxiv.org/abs/1501.02876)
+uses AllReduce to aggregate the updates of parameter from all workers. It has
+similar architecture as [Fig.2](architecture.html),
+where every process has a server group and is connected with all other processes.
+Hence, we can implement DeepImage in Singa by simply using MPI's AllReduce function for
+inter-process communication.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/data.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/data.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/data.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/data.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,98 @@
+# Data Preparation
+
+---
+
+SINGA uses input layers to load data.
+Users can store their data in any format (e.g., CSV or binary) and at any places
+(e.g., disk file or HDFS) as long as there are corresponding input layers that
+can read the data records and parse them.
+
+To make it easy for users, SINGA provides a [StoreInputLayer] to read data
+in the format of (string:key, string:value) tuples from a couple of sources.
+These sources are abstracted using a [Store]() class which is a simple version of
+the DB abstraction in Caffe. The base Store class provides the following operations
+for reading and writing tuples,
+
+    Open(string path, Mode mode); // open the store for kRead or kCreate or kAppend
+    Close();
+
+    Read(string* key, string* val); // read a tuple; return false if fail
+    Write(string key, string val);  // write a tuple
+    Flush();
+
+Currently, two implementations are provided, namely
+
+1. [KVFileStore] for storing tuples in [KVFile]() (a binary file).
+The *create_data.cc* files in *examples/cifar10* and *examples/mnist* provide
+examples of storing records using KVFileStore.
+
+2. [TextFileStore] for storing tuples in plain text file (one line per tuple).
+
+The (key, value) tuple are parsed by subclasses of StoreInputLayer depending on the
+format of the tuple,
+
+* [ProtoRecordInputLayer] parses the value field from one
+tuple into a [SingleLabelImageRecord], which is generated by Google Protobuf according
+to [common.proto]. It can be used to store features for images (e.g., using the pixel field)
+or other objects (using the data field). The key field is not used.
+
+* [CSVRecordInputLayer] parses one tuple as a CSV line (separated by comma).
+
+
+## Using built-in record format
+
+SingleLabelImageRecord is a built-in record in SINGA for storing image features.
+It is used in the cifar10 and mnist examples.
+
+    message SingleLabelImageRecord {
+      repeated int32 shape = 1;                // it obtains 3 (rgb channels), 32 (row), 32 (col)
+      optional int32 label = 2;                // label
+      optional bytes pixel = 3;                // pixels
+      repeated float data = 4 [packed = true]; // it is used for normalization
+   }
+
+The data preparation instructions for the [CIFAR-10 image dataset](http://www.cs.toronto.edu/~kriz/cifar.html)
+will be elaborated here. This dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class.
+There are 50,000 training images and 10,000 test images.
+Each image has a single label. This dataset is stored in binary files with specific format.
+SINGA comes with the [create_data.cc](https://github.com/apache/incubator-singa/blob/master/examples/cifar10/create_data.cc)
+to convert images in the binary files into `SingleLabelImageRecord`s and insert them into training and test stores.
+
+1. Download raw data. The following command will download the dataset into *cifar-10-batches-bin* folder.
+
+        # in SINGA_ROOT/examples/cifar10
+        $ cp Makefile.example Makefile   // an example makefile is provided
+        $ make download
+
+2. Fill one record for each image, and insert it to store.
+
+        KVFileStore store;
+        store.Open(output_file_path, singa::io::kCreate);
+
+        singa::SingleLabelImageRecord image;
+        for (int image_id = 0; image_id < 50000; image_id ++) {
+          // fill the record with image feature and label from downloaded binay files
+          string str;
+          image.SerializeToString(&str);
+          store.Write(to_string(image_id), str);
+        }
+        store.Flush();
+        store.Close();
+
+    The data store for testing data is created similarly.
+    In addition, it computes average values (not shown here) of image pixels and
+    insert the mean values into a SingleLabelImageRecord, which is then written
+    into a another store.
+
+3. Compile and run the program. SINGA provides an example Makefile that contains instructions
+    for compiling the source code and linking it with *libsinga.so*. Users just execute the following command.
+
+        $ make create
+
+## using user-defined record format
+
+If users cannot use the SingleLabelImageRecord or CSV record for their data.
+They can define their own record format e.g., using Google Protobuf.
+A record can be written into a data store as long as it can be converted
+into byte string. Correspondingly, subclasses of StoreInputLayer are required to
+parse user-defined records.

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/debug.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/debug.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/debug.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/debug.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,29 @@
+# How to Debug
+
+---
+
+Since SINGA is developed on Linux using C++, GDB is the preferred debugging
+tool. To use GDB, the code must be compiled with `-g` flag. This is enabled by
+
+    ./configure --enable-debug
+    make
+
+## Debugging for single process job
+
+If your job launches only one process, then use the default *conf/singa.conf*
+for debugging. The process will be launched locally.
+
+To debug, first start zookeeper if it is not started yet, and launch GDB
+
+    # do this for only once
+    ./bin/zk-service.sh start
+    # do this every time
+    gdb .libs/singa
+
+Then set the command line arguments
+
+    set args -conf JOBCONF
+
+Now you can set your breakpoints and start running.
+
+## Debugging for jobs with multiple processes

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/distributed-training.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/distributed-training.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/distributed-training.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/distributed-training.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,30 @@
+# 分散トレーニング
+
+---
+
+SINGAは、大規模なデータ分析の為の巨大なディープラーニングモデルの分散トレーニングを目的としてデザインされています。
+
+分散トレーニングを可能とさせる SINGA のアーキテクチャーに関する詳細は、下のリンクを参照してください。
+
+* [システム アーキテクチャー](architecture.html)
+
+* [トレーニング フレームワーク](frameworks.html)
+
+* [システム コミュニケーション](communication.html)
+
+モデルのトレーニングを並列化するために、様々な並列方法(データ並列、モデル並列、ハイブリッド並列など)をサポートします。
+
+* [ハイブリッド 並列化](hybrid.html)
+
+現在 SINGA は Mesos と統合されているので、分散トレーニングを Mesos フレームワークとして実行できます。
+Mesos クラスタは、SINGA コンテナから設定できます。
+Mesos と SINGA をバンドルした Docker イメージを用意しました。
+
+クラスタの準備と開始に関する詳細は、下のリンクを参照してください。
+
+* [Mesos で分散トレーニング](mesos.html)
+
+分散ストレージシステム上で SINGA を走らせ、スケーラビリティを保証します。
+現在 SINGA は HDFSをサポートしています。
+
+* [HDFS で SINGA を実行](hdfs.html)

Added: incubator/singa/site/trunk/content/markdown/v0.2.0/jp/docker.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/jp/docker.md?rev=1738695&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.2.0/jp/docker.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.2.0/jp/docker.md Tue Apr 12 06:22:20 2016
@@ -0,0 +1,192 @@
+# Building SINGA Docker container 
+ 
+This guide explains how to set up a development environment for SINGA using Docker. It requires only Docker to be installed. The resulting image contains the complete working environment for SINGA. The image can then be used to set up cluster environment over one or multiple physical nodes.  
+
+1. [Build SINGA base](#build_base)
+2. [Build SINGA with Mesos and Hadoop](#build_mesos)
+3. [Pre-built images](#pre_built)
+4. [Launch and stop SINGA (stand alone mode)](#launch_stand_alone)
+5. [Launch pseudo-distributed SINGA on one node](#launch_pseudo)
+6. [Launch fully distributed SINGA on multiple nodes](#launch_distributed)
+
+---
+
+<a name="build_base"></a>
+#### Build SINGA base image
+ 
+````
+$ cd tool/docker/singa
+$ sudo docker build -t singa/base . 
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
+singa/base             latest              XXXX                XXX                 2.01 GB
+````
+
+The result is the image containing a built version of SINGA. 
+
+   ![singa/base](http://www.comp.nus.edu.sg/~dinhtta/files/images_base.png)
+
+   *Figure 1. singa/base Docker image, containing library dependencies and SINGA built from source.*
+
+---
+
+<a name="build_mesos"></a>
+#### Build SINGA with Mesos and Hadoop
+````
+$ cd tool/docker/mesos
+$ sudo docker build -t singa/mesos .
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
+singa/mesos             latest              XXXX                XXX                 4.935 GB
+````
+   ![singa/mesos](http://www.comp.nus.edu.sg/~dinhtta/files/images_mesos.png#1)
+   
+   *Figure 2. singa/mesos Docker image, containing Hadoop and Mesos built on
+top of SINGA. The default namenode address for Hadoop is `node0:9000`*
+
+**Notes** A common failure observed during the build process is caused by network failure occuring when downloading dependencies. Simply re-run the build command. 
+
+---
+
+<a name="pre_built"></a>
+#### Pre-built images on epiC cluster
+For users with access to the `epiC` cluster, there are pre-built and loaded Docker images at the following nodes:
+
+      ciidaa-c18
+      ciidaa-c19
+
+The available images at those nodes are:
+
+````
+REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
+singa/base             latest              XXXX                XXX                 2.01 GB
+singa/mesos            latest              XXXX                XXX                 4.935 GB
+weaveworks/weaveexec   1.1.1               XXXX                11 days ago         57.8 MB
+weaveworks/weave       1.1.1               XXXX                11 days ago         17.56 MB
+````
+
+---
+
+<a name="launch_stand_alone"></a>
+#### Launch and stop SINGA in stand-alone mode
+To launch a test environment for a single-node SINGA training, simply start a container from `singa/base` image. The following starts a container called
+`XYZ`, then launches a shell in the container: 
+
+````
+$ sudo docker run -dt --name XYZ singa/base /usr/bin/supervisord
+$ sudo docker exec -it XYZ /bin/bash
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_standalone.png#1)
+
+   *Figure 3. Launch SINGA in stand-alone mode: single node training*
+
+Inside the launched container, the SINGA source directory can be found at `/root/incubator-singa`. 
+
+**Stopping the container**
+
+````
+$ sudo docker stop XYZ
+$ sudo docker rm ZYZ
+````
+
+---
+
+<a name="launch_pseudo"></a>
+#### Launch SINGA on pseudo-distributed mode (single node)
+To simulate a distributed environment on a single node, one can repeat the
+previous step multiple times, each time giving a different name to the
+container.  Network connections between these containers are already supported,
+thus SINGA instances/nodes in these container can readily communicate with each
+other. 
+
+The previous approach requires the user to start SINGA instances individually
+at each container. Although there's a bash script for that, we provide a better
+way. In particular, multiple containers can be started from `singa/mesos` image
+which already bundles Mesos and Hadoop with SINGA. Using Mesos makes it easy to
+launch, stop and monitor the distributed execution from a single container.
+Figure 4 shows `N+1` containers running concurrently at the local host. 
+
+````
+$ sudo docker run -dt --name node0 singa/mesos /usr/bin/supervisord
+$ sudo docker run -dt --name node1 singa/mesos /usr/bin/supervisord
+...
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_pseudo.png#1)
+   
+*Figure 4. Launch SINGA in pseudo-distributed mode : multiple SINGA nodes over one single machine*
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos
+guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+
+**Important:** the container that assumes the role of Hadoop's namenode (and often Mesos's and Zookeeper's mater node as well) **must** be named `node0`. Otherwise, the user must log in to individual containers and change the Hadoop configuration separately. 
+ 
+---
+
+<a name="launch_distributed"></a>
+#### Launch SINGA on fully distributed mode (multiple nodes)
+The previous section has explained how to start a distributed environment on a
+single node. But running many containers on one node does not scale. When there
+are multiple physical hosts available, it is better to distribute the
+containers over them. 
+
+The only extra requirement for the fully distributed mode, as compared with the
+pseudo distributed mode, is that the containers from different hosts are able
+to transparently communicate with each other. In the pseudo distributed mode,
+the local docker engine takes care of such communication. Here, we rely on
+[Weave](http://weave.works/guides/weave-docker-ubuntu-simple.html) to make the
+communication transparent. The resulting architecture is shown below.  
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_full.png#1)
+   
+*Figure 5. Launch SINGA in fully distributed mode: multiple SINGA nodes over multiple machines*
+
+**Install Weave at all hosts**
+
+```
+$ curl -L git.io/weave -o /usr/local/bin/weave
+$ chmod a+x /usr/local/bin/weave
+```
+
+**Starting Weave**
+
+Suppose `node0` will be launched at host with IP `111.222.111.222`.
+
++ At host `111.222.111.222`:
+
+          $ weave launch
+          $ eval "$(weave env)"  //if there's error, do `sudo -s` and try again
+
++ At other hosts:
+
+          $ weave launch 111.222.111.222
+          $ eval "$(weave env)" //if there's error, do `sudo -s` and try again
+
+**Starting containers**
+
+The user logs in to each host and starts the container (same as in [pseudo-distributed](#launch_pseudo) mode). Note that container acting as the head node of the cluster must be named `node0` (and be running at the host with IP `111.222.111.222`, for example). 
+
+**_Important_:** when there are other containers sharing the same host as `node0`, say `node1` and `node2` for example,
+there're additional changes to be made to `node1` and `node2`. Particularly, log in to each container and edit
+`/etc/hosts` file:
+
+````
+# modified by weave
+...
+X.Y.Z	node0 node0.bridge  //<- REMOVE this line
+..
+````
+This is to ensure that name resolutions (of `node0`'s address) from `node1` and `node2` are correct. By default,
+containers of the same host resolves each other's addresses via the Docker bridge. Instead, we want they to use
+addressed given by Weave.  
+
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+