You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2015/09/28 08:40:15 UTC
svn commit: r1705605 - in /incubator/singa/site/trunk/content/markdown:
develop/schedule.md docs/checkpoint.md docs/debug.md docs/installation.md
docs/layer.md docs/programming-guide.md docs/quick-start.md docs/rbm.md
Author: wangwei
Date: Mon Sep 28 06:40:14 2015
New Revision: 1705605
URL: http://svn.apache.org/viewvc?rev=1705605&view=rev
Log:
update docs of rbm (.bin), layer and installation to be consistent with the code (README.md)
Modified:
incubator/singa/site/trunk/content/markdown/develop/schedule.md
incubator/singa/site/trunk/content/markdown/docs/checkpoint.md
incubator/singa/site/trunk/content/markdown/docs/debug.md
incubator/singa/site/trunk/content/markdown/docs/installation.md
incubator/singa/site/trunk/content/markdown/docs/layer.md
incubator/singa/site/trunk/content/markdown/docs/programming-guide.md
incubator/singa/site/trunk/content/markdown/docs/quick-start.md
incubator/singa/site/trunk/content/markdown/docs/rbm.md
Modified: incubator/singa/site/trunk/content/markdown/develop/schedule.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/develop/schedule.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/develop/schedule.md (original)
+++ incubator/singa/site/trunk/content/markdown/develop/schedule.md Mon Sep 28 06:40:14 2015
@@ -3,7 +3,7 @@
| Release | Module| Feature | Status |
|---------|---------|-------------|--------|
-| 0.1 September | Neural Network |1.1. Feed forward neural network, including CNN, MLP | done|
+| 0.1 Sep. | Neural Network |1.1. Feed forward neural network, including CNN, MLP | done|
| | |1.2. RBM-like model, including RBM | done|
| | |1.3. Recurrent neural network, including standard RNN | done|
| | Architecture |1.4. One worker group on single node (with data partition)| done|
@@ -14,15 +14,11 @@
| | |1.9. Load-balance among servers | done|
| | Failure recovery|1.10. Checkpoint and restore |done|
| | Tools|1.11. Installation with GNU auto tools| done|
-|0.2 October | Neural Network |2.1. Feed forward neural network, including auto-encoders, hinge loss layers, HDFS data layers||
-| | |2.2. RBM-like model, including DBM | |
-| | |2.3. Recurrent neural network, including LSTM| |
-| | |2.4. Model partition ||
-| | Communication |2.5. MPI||
-| | GPU |2.6. Single GPU ||
-| | |2.7. Multiple GPUs on single node||
-| | Resource Management |1.9. Integration with Mesos ||
-| | Architecture |2.8. Update to support GPUs
-| | Fault Tolerance|2.9. Node failure detection and recovery||
-| | Binding |2.9. Python binding ||
-| | User Interface |2.10. Web front-end for job submission and performance visualization||
+|0.2 Nov. | Neural Network |2.1. Feed forward neural network, including VGG model, CSV input layer, HDFS output layer, etc.||
+| | |2.2. Recurrent neural network, including GRU and LSTM| |
+| | |2.3. Model partition and hybrid partition||
+| | Configuration |2.4. Configuration helpers for popular models, e.g., CNN, MLP, Auto-encoders||
+| | Tools |2.5. Integration with Mesos for resource management||
+| | |2.6. Prepare Docker images for deployment||
+| | Binding |2.7. Python binding for major components ||
+| | GPU |2.8. Single node with multiple GPUs ||
Modified: incubator/singa/site/trunk/content/markdown/docs/checkpoint.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/checkpoint.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/checkpoint.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/checkpoint.md Mon Sep 28 06:40:14 2015
@@ -27,7 +27,7 @@ For example,
checkpoint_frequency: 300
...
-Checkpointing files are located at *WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin*.
+Checkpointing files are located at *WORKSPACE/checkpoint/stepSTEP-workerWORKERID*.
*WORKSPACE* is configured in
cluster {
@@ -37,8 +37,8 @@ Checkpointing files are located at *WORK
For the above configuration, after training for 700 steps, there would be
two checkpointing files,
- step400-worker0.bin
- step700-worker0.bin
+ step400-worker0
+ step700-worker0
## Application - resuming training
@@ -54,15 +54,15 @@ We can also use the checkpointing file f
a new model by configuring the new job as,
# job.conf
- checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
+ checkpoint : "WORKSPACE/checkpoint/step400-worker0"
...
If there are multiple checkpointing files for the same snapshot due to model
partitioning, all the checkpointing files should be added,
# job.conf
- checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
- checkpoint : "WORKSPACE/checkpoint/step400-worker1.bin"
+ checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+ checkpoint : "WORKSPACE/checkpoint/step400-worker1"
...
The training command is the same as starting a new job,
Modified: incubator/singa/site/trunk/content/markdown/docs/debug.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/debug.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/debug.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/debug.md Mon Sep 28 06:40:14 2015
@@ -18,7 +18,7 @@ To debug, first start zookeeper if it is
# do this for only once
./bin/zk-service.sh start
# do this every time
- gdb ./bin/singa
+ gdb .libs/singa
Then set the command line arguments
Modified: incubator/singa/site/trunk/content/markdown/docs/installation.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/installation.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/installation.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/installation.md Mon Sep 28 06:40:14 2015
@@ -26,7 +26,19 @@ Optional dependencies include:
* lmdb version 0.9.10
-SINGA comes with a script for installing the above libraries (see below).
+You can install all dependencies into $PREFIX folder by
+
+ ./thirdparty/install.sh all $PREFIX
+
+If $PREFIX is not a system path (e.g., /usr/local/), please export the following
+variables to continue the building instructions,
+
+ export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
+ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+ export PATH=$PREFIX/bin:$PATH
+
+More details on using this script is given below.
## Building SINGA from source
@@ -60,18 +72,16 @@ There are two ways to build SINGA,
$ ./configure --enable-lmdb
+<!---
+Zhongle: please update the code to use the follow command
-The SINGA test is not included by default settings. If you want to run the
-test, please compile with `--enable-test`. You can run:
-
-
- $ ./configure --enable-test
- $ make
+ $ make test
After compilation, you will find the binary file singatest. Just run it!
More details about configure script can be found by running:
- $ ./configure --help
+ $ ./configure -h
+-->
After compiling SINGA successfully, the *libsinga.so* and the executable file
*singa* will be generated into *.libs/* folder.
@@ -79,8 +89,15 @@ After compiling SINGA successfully, the
If some dependent libraries are missing (or not detected), you can use the
following script to download and install them:
+<!---
+to be updated after zhongle changes the code to use
+
+ ./install.sh libname \-\-prefix=
+
+-->
+
$ cd thirdparty
- $ ./install.sh MISSING_LIBRARY_NAME1 YOUR_INSTALL_PATH1 MISSING_LIBRARY_NAME2 YOUR_INSTALL_PATH2 ...
+ $ ./install.sh LIB_NAME PREFIX
If you do not specify the installation path, the library will be installed in
the default folder specified by the software itself. For example, if you want
@@ -90,7 +107,7 @@ to install `zeromq` library in the defau
Or, if you want to install it into another folder,
- $ ./install.sh zeromq --prefix=YOUR_FOLDER
+ $ ./install.sh zeromq PREFIX
You can also install all dependencies in */usr/local* directory:
@@ -98,8 +115,7 @@ You can also install all dependencies in
Here is a table showing the first arguments:
- MISSING_LIBRARY_NAME LIBRARIES
- cmake cmake tools
+ LIB_NAME LIBRARIE
czmq* czmq lib
glog glog lib
lmdb lmdb lib
@@ -112,50 +128,120 @@ Here is a table showing the first argume
indicate `zeromq` location.
The installation commands of `czmq` is:
- $./install.sh czmq /usr/local /usr/local/zeromq
+<!---
+to be updated to
+
+ $./install.sh czmq \-\-prefix=/usr/local \-\-zeromq=/usr/local/zeromq
+-->
+
+ $./install.sh czmq /usr/local -f=/usr/local/zeromq
After the execution, `czmq` will be installed in */usr/local*. The last path
specifies the path to zeromq.
### FAQ
+* Q1:I get error `./configure --> cannot find blas_segmm() function` even I
+have installed OpenBLAS.
+
+ A1: This means the compiler cannot find the `OpenBLAS` library. If you installed
+ it to $PREFIX (e.g., /opt/OpenBLAS), then you need to export it as
+
+ $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+ # e.g.,
+ $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
+
+
+* Q2: I get error `cblas.h no such file or directory exists`.
+
+ Q2: You need to include the folder of the cblas.h into CPLUS_INCLUDE_PATH,
+ e.g.,
+
+ $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+ # e.g.,
+ $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
+ # then reconfigure and make SINGA
+ $ ./configure
+ $ make
+
+
+* Q3:While compiling SINGA, I get error `SSE2 instruction set not enabled`
-Q1:While compiling SINGA and installing `glog` on max OS X, I get fatal error
+ A3:You can try following command:
+
+ $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+
+
+* Q4:I get `ImportError: cannot import name enum_type_wrapper` from
+google.protobuf.internal when I try to import .py files.
+
+ A4:After install google protobuf by `make install`, we should install python
+ runtime libraries. Go to protobuf source directory, run:
+
+ $ cd /PROTOBUF/SOURCE/FOLDER
+ $ cd python
+ $ python setup.py build
+ $ python setup.py install
+
+ You may need `sudo` when you try to install python runtime libraries in
+ the system folder.
+
+
+* Q5: I get a linking error caused by gflags.
+
+ A5: SINGA does not depend on gflags. But you may have installed the glog with
+ gflags. In that case you can reinstall glog using *thirdparty/install.sh* into
+ a another folder and export the LDFLAGS and CPPFLAGS to include that folder.
+
+
+* Q6: While compiling SINGA and installing `glog` on mac OS X, I get fatal error
`'ext/slist' file not found`
-A1:Please install `glog` individually and try :
+ A6:Please install `glog` individually and try :
- $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+ $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+* Q7: When I start a training job, it reports error related with "ZOO_ERROR...zk retcode=-4...".
-Q2:While compiling SINGA, I get error `SSE2 instruction set not enabled`
+ A7: This is because the zookeeper is not started. Please start the zookeeper service
-A2:You can try following command:
+ $ ./bin/zk-service start
- $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+ If the error still exists, probably that you do not have java. You can simple
+ check it by
-Q3:I get error `./configure --> cannot find blas_segmm() function` even I
-run `install.sh OpenBLAS`.
+ $ java --version
-A3:Since `OpenBLAS` library is installed in `/opt` folder by default or
-`/other/folder` by your preference, you may edit your environment settings.
-You need add its default installation directories before linking, just
-run:
+* Q8: When I build OpenBLAS from source, I am told that I need a fortran compiler.
- $ export LDFLAGS=-L/opt
+ A8: You can compile OpenBLAS by
-Or as an alternative option, you can also edit LIBRARY_PATH to figure it out.
+ $ make ONLY_CBLAS=1
+ or install it using
-Q4:I get `ImportError: cannot import name enum_type_wrapper` from
-google.protobuf.internal when I try to import .py files.
+ $ sudo apt-get install openblas-dev
+
+ or
+
+ $ sudo yum install openblas-devel
+
+ It is worth noting that you need root access to run the last two commands.
+ Remember to set the environment variables to include the header and library
+ paths of OpenBLAS after installation (please refer to the Dependencies section).
+
+* Q9: When I build protocol buffer, it reports that GLIBC++_3.4.20 not found in /usr/lib64/libstdc++.so.6.
+
+ A9: This means the linker found libstdc++.so.6 but that library
+ belongs to an older version of GCC than was used to compile and link the
+ program. The program depends on code defined in
+ the newer libstdc++ that belongs to the newer version of GCC, so the linker
+ must be told how to find the newer libstdc++ shared library.
+ The simplest way to fix this is to find the correct libstdc++ and export it to
+ LD_LIBRARY_PATH. For example, if GLIBC++_3.4.20 is listed in the output of the
+ following command,
-A4:After install google protobuf by `make install`, we should install python
-runtime libraries. Go to protobuf source directory, run:
+ $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
- $ cd /PROTOBUF/SOURCE/FOLDER
- $ cd python
- $ python setup.py build
- $ python setup.py install
+ then you just set your environment variable as
-You may need `sudo` when you try to install python runtime libraries in
-the system folder.
+ $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH
Modified: incubator/singa/site/trunk/content/markdown/docs/layer.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/layer.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/layer.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/layer.md Mon Sep 28 06:40:14 2015
@@ -366,12 +366,10 @@ implement a new Layer subclass.
#### Members
- LayerProto layer_proto_;
+ LayerProto layer_conf_;
Blob<float> data_, grad_;
- vector<Layer*> srclayers_, dstlayers_;
-The base layer class keeps the user configuration in `layer_proto_`. Source
-layers and destination layers are stored in `srclayers_` and `dstlayers_`, respectively.
+The base layer class keeps the user configuration in `layer_conf_`.
Almost all layers has $b$ (mini-batch size) feature vectors, which are stored
in the `data_` [Blob](../api/classsinga_1_1Blob.html) (A Blob is a chunk of memory space, proposed in
[Caffe](http://caffe.berkeleyvision.org/)).
@@ -390,14 +388,16 @@ parameters, we do not declare any `Param
#### Functions
- virtual void Setup(const LayerProto& proto, int npartitions = 1);
- virtual void ComputeFeature(Phase phase, Metric* perf) = 0;
- virtual void ComputeGradient(Phase phase) = 0;
+ virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers);
+ virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0;
+ virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0;
-The `Setup` function reads user configuration, i.e. `proto`, and information
+The `Setup` function reads user configuration, i.e. `conf`, and information
from source layers, e.g., mini-batch size, to set the
shape of the `data_` (and `grad_`) field as well
-as some other layer specific fields. If `npartitions` is larger than 1, then
+as some other layer specific fields.
+<!---
+If `npartitions` is larger than 1, then
users need to reduce the sizes of `data_`, `grad_` Blobs or Param objects. For
example, if the `partition_dim=0` and there is no source layer, e.g., this
layer is a (bottom) data layer, then its `data_` and `grad_` Blob should have
@@ -405,8 +405,9 @@ layer is a (bottom) data layer, then its
dimension 0, then this layer should have the same number of feature vectors as
the source layer. More complex partition cases are discussed in
[Neural net partitioning](neural-net.html#neural-net-partitioning). Typically, the
-Setup function just set the shapes of `data_` Blobs and Param objects. Memory
-will not be allocated until computation over the data structure happens.
+Setup function just set the shapes of `data_` Blobs and Param objects.
+-->
+Memory will not be allocated until computation over the data structure happens.
The `ComputeFeature` function evaluates the feature blob by transforming (e.g.
convolution and pooling) features from the source layers. `ComputeGradient`
@@ -434,6 +435,8 @@ logics as long as the two virtual functi
the `TrainOneBatch` function. The `Setup` function may also be overridden to
read specific layer configuration.
+The [RNNLM](rnn.html) provides a couple of user-defined layers. You can refer to them as examples.
+
#### Layer specific protocol message
To implement a new layer, the first step is to define the layer specific
@@ -489,9 +492,9 @@ The new layer subclass can be implemente
class FooLayer : public singa::Layer {
public:
- void Setup(const LayerProto& proto, int npartitions = 1) override;
- void ComputeFeature(Phase phase, Metric* perf) override;
- void ComputeGradient(Phase phase) override;
+ void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override;
+ void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
+ void ComputeGradient(int flag, const vector<Layer*>& srclayers) override;
private:
// members
@@ -500,7 +503,7 @@ The new layer subclass can be implemente
Users must override the two virtual functions to be called by the
`TrainOneBatch` for either BP or CD algorithm. Typically, the `Setup` function
will also be overridden to initialize some members. The user configured fields
-can be accessed through `layer_proto_` as shown in the above paragraphs.
+can be accessed through `layer_conf_` as shown in the above paragraphs.
#### New Layer subclass registration
Modified: incubator/singa/site/trunk/content/markdown/docs/programming-guide.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/programming-guide.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/programming-guide.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/programming-guide.md Mon Sep 28 06:40:14 2015
@@ -69,7 +69,7 @@ An example main function is like
auto jobConf = driver.job_conf();
// update jobConf
- driver.Submit(resume, jobConf);
+ driver.Train(resume, jobConf);
return 0;
}
Modified: incubator/singa/site/trunk/content/markdown/docs/quick-start.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/quick-start.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/quick-start.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/quick-start.md Mon Sep 28 06:40:14 2015
@@ -48,6 +48,7 @@ available at [CNN example](cnn.html).
Download the dataset and create the data shards for training and testing.
cd examples/cifar10/
+ cp Makefile.example Makefile
make download
make create
Modified: incubator/singa/site/trunk/content/markdown/docs/rbm.md
URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/rbm.md?rev=1705605&r1=1705604&r2=1705605&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/rbm.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/rbm.md Mon Sep 28 06:40:14 2015
@@ -192,7 +192,7 @@ The neural net configuration is (with la
To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as,
- checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0.bin"
+ checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
cluster{
workspace: "examples/rbm/rbm2"
}
@@ -337,10 +337,10 @@ configuration is (with some of the middl
To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as
### Checkpoint Configuration
- checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0.bin"
- checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0.bin"
- checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0.bin"
- checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0.bin"
+ checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
+ checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"
## Visualization Results