You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@singa.apache.org by wa...@apache.org on 2020/04/05 03:35:26 UTC
[singa] branch dev updated: change autograd doc to rst

This is an automated email from the ASF dual-hosted git repository.

wangwei pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/singa.git


The following commit(s) were added to refs/heads/dev by this push:
     new d85931c  change autograd doc to rst
     new cc63fac  Merge pull request #654 from joddiy/update_autograd_doc
d85931c is described below

commit d85931c1c501ae71bd612e443aa594fe0a4cc59a
Author: joddiy <jo...@qq.com>
AuthorDate: Sun Apr 5 01:33:42 2020 +0800

    change autograd doc to rst
---
 doc/en/docs/autograd.md  | 166 -----------------------------------------------
 doc/en/docs/autograd.rst |  23 +++++++
 python/singa/autograd.py | 162 ++++++++++++++++++++++-----------------------
 3 files changed, 105 insertions(+), 246 deletions(-)

diff --git a/doc/en/docs/autograd.md b/doc/en/docs/autograd.md
deleted file mode 100644
index 30cf28e..0000000
--- a/doc/en/docs/autograd.md
+++ /dev/null
@@ -1,166 +0,0 @@
-<!--
-    Licensed to the Apache Software Foundation (ASF) under one
-    or more contributor license agreements.  See the NOTICE file
-    distributed with this work for additional information
-    regarding copyright ownership.  The ASF licenses this file
-    to you under the Apache License, Version 2.0 (the
-    "License"); you may not use this file except in compliance
-    with the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-    Unless required by applicable law or agreed to in writing,
-    software distributed under the License is distributed on an
-    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-    KIND, either express or implied.  See the License for the
-    specific language governing permissions and limitations
-    under the License.
--->
-
-
-# Autograd in Singa
-
-There are two typical ways to implement autograd, via symbolic differentiation like [Theano](http://deeplearning.net/software/theano/index.html) or reverse differentiation like [Pytorch](https://pytorch.org/docs/stable/notes/autograd.html). Singa follows Pytorch way, which records the computation graph and apply the backward propagation automatically after forward propagation. The autograd algorithm is explained in details [here](https://pytorch.org/docs/stable/notes/autograd.html). We e [...]
-
-## Relevant Modules
-
-There are three classes involved in autograd, namely  `singa.tensor.Tensor` , `singa.autograd.Operation`, and `singa.autograd.Layer`. In the rest of this article, we use tensor, operation and layer to refer to an instance of the respective class.
-
-### Tensor
-
-Three attributes of Tensor are used by autograd, 
--  `.creator` is an `Operation` instance. It records the operation that generates the Tensor instance.
--  `.requires_grad` is a boolean variable. It is used to indicate that the autograd algorithm needs to compute the gradient of the tensor (i.e., the owner). For example, during backpropagation, the gradients of the tensors for the weight matrix of a linear layer and the feature maps of a convolution layer (not the bottom layer) should be computed.
--  `.stores_grad` is a boolean variable. It is used to indicate that the gradient of the owner tensor should be stored and output by the backward function. For example, the gradient of the feature maps is computed during backpropagation, but is not included in the output of the backward function. 
-
-Programmers can change `requires_grad` and `stores_grad` of a Tensor instance. For example, if later is set to True, the corresponding gradient is included in the output of the backward function. It should be noted that if `stores_grad` is True, then `requires_grad` must be true, not vice versa.
-
-
-### Operation
-
-It takes one or more `Tensor` instances as input, and then outputs one or more `Tensor` instances. For example, ReLU can be implemented as a specific Operation subclass. When an `Operation` instance is called (after instantiation), the following two steps are executed:
-
-1. record the source operations, i.e., the `creator`s of the input tensors.    2. do calculation by calling member function `.forward()`
-
-There are two member functions for forwarding and backwarding, i.e., `.forward()` and `.backward()`. They take `Tensor.data` as inputs (the type is `CTensor`), and output `Ctensor`s. To add a specific operation, subclass `operation` should implement their own `.forward()` and `.backward()`. The `backward()` function is called by the `backward()` function of autograd automatically during backward propogation to compute the gradients of inputs (according to the `require_grad` field). 
-
-### Layer
-
-For those operations that require parameters, we package them into a new class, `Layer`. For example, convolution operation is wrapped into a convolution layer. `Layer` manages (stores) the parameters and calls the corresponding `Operation`s to implement the transformation.
-
-
-
-## Examples
-
-Multiple examples are provided in the [example folder](https://github.com/apache/singa/tree/master/examples/autograd). We explain two representative examples here.
-
-### Operation only
-
-The following codes implement a MLP model using only Operation instances (no Layer instances).
-
-#### Import packages
-
-```
-from singa.tensor import Tensor
-from singa import autograd
-from singa import opt
-```
-
-#### Create weight matrix and bias vector
-
-The parameter tensors are created with both `requires_grad` and `stores_grad` set to True.
-
-```
-w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True)
-w0.gaussian(0.0, 0.1)
-b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True)
-b0.set_value(0.0)
-
-w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True)
-w1.gaussian(0.0, 0.1)
-b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True)
-b1.set_value(0.0)
-```
-
-#### Training
-```
-inputs = Tensor(data=data)  # data matrix
-target = Tensor(data=label) # label vector
-autograd.training = True    # for training
-sgd = opt.SGD(0.05)   # optimizer
-
-for i in range(10):
-    x = autograd.matmul(inputs, w0) # matrix multiplication
-    x = autograd.add_bias(x, b0)    # add the bias vector
-    x = autograd.relu(x)            # ReLU activation operation
-
-    x = autograd.matmul(x, w1)
-    x = autograd.add_bias(x, b1)
-    
-    loss = autograd.softmax_cross_entropy(x, target)
-    
-    for p, g in autograd.backward(loss):        
-        sgd.update(p, g)
-```
-
-
-### Operation + Layer
-
-The following [example](https://github.com/apache/singa/blob/master/examples/autograd/mnist_cnn.py) implements a CNN model using layers provided by the autograd module.
-
-#### Create the layers
-
-```
-conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False)
-bn1 = autograd.BatchNorm2d(32)
-pooling1 = autograd.MaxPool2d(3, 1, padding=1)
-conv21 = autograd.Conv2d(32, 16, 3, padding=1)
-conv22 = autograd.Conv2d(32, 16, 3, padding=1)
-bn2 = autograd.BatchNorm2d(32)
-linear = autograd.Linear(32 * 28 * 28, 10)    
-pooling2 = autograd.AvgPool2d(3, 1, padding=1)
-```
-
-#### Define the forward function
-
-The operations in the forward pass will be recorded automatically for backward propagation.
-
-```
-def forward(x, t):
-    # x is the input data (a batch of images)
-    # t the the label vector (a batch of integers)
-    y = conv1(x)           # Conv layer  
-    y = autograd.relu(y)   # ReLU operation
-    y = bn1(y)             # BN layer
-    y = pooling1(y)        # Pooling Layer
-    
-    # two parallel convolution layers
-    y1 = conv21(y)
-    y2 = conv22(y)
-    y = autograd.cat((y1, y2), 1)  # cat operation
-    y = autograd.relu(y)           # ReLU operation
-    y = bn2(y)
-    y = pooling2(y)
-
-    y = autograd.flatten(y)        # flatten operation
-    y = linear(y)                  # Linear layer
-    loss = autograd.softmax_cross_entropy(y, t)  # operation 
-    return loss, y
-```
-
-#### Training
-
-```
-autograd.training = True
-for epoch in range(epochs):
-    for i in range(batch_number):
-        inputs = tensor.Tensor(device=dev, data=x_train[
-                               i * batch_sz:(1 + i) * batch_sz], stores_grad=False)
-        targets = tensor.Tensor(device=dev, data=y_train[
-                                i * batch_sz:(1 + i) * batch_sz], requires_grad=False, stores_grad=False)
-
-        loss, y = forward(inputs, targets) # forward the net
-    
-        for p, gp in autograd.backward(loss):  # auto backward
-            sgd.update(p, gp)
-```
diff --git a/doc/en/docs/autograd.rst b/doc/en/docs/autograd.rst
new file mode 100644
index 0000000..b8d4203
--- /dev/null
+++ b/doc/en/docs/autograd.rst
@@ -0,0 +1,23 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+   or more contributor license agreements.  See the NOTICE file
+   distributed with this work for additional information
+   regarding copyright ownership.  The ASF licenses this file
+   to you under the Apache License, Version 2.0 (the
+   "License"); you may not use this file except in compliance
+   with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing,
+   software distributed under the License is distributed on an
+   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+   KIND, either express or implied.  See the License for the
+   specific language governing permissions and limitations
+   under the License.
+
+
+Autograd API
+========
+
+.. automodule:: singa.autograd
+   :members:
\ No newline at end of file
diff --git a/python/singa/autograd.py b/python/singa/autograd.py
index 4e7593f..ca23177 100644
--- a/python/singa/autograd.py
+++ b/python/singa/autograd.py
@@ -596,7 +596,7 @@ class Matmul(Operation):
 
     def forward(self, x, w):
         """
-        Return np.matmul(x,w), where x and w are CTensor.
+        Return `np.matmul(x,w)`, where x and w are CTensor.
         """
         if training:
             self.input = (x, w)
@@ -617,7 +617,7 @@ class Matmul(Operation):
 
 def matmul(x, w):
     """
-    Return np.matmul(x,w), where x and w are Tensor.
+    Return `np.matmul(x,w)`, where x and w are Tensor.
     """
     return Matmul()(x, w)[0]
 
@@ -778,8 +778,8 @@ def reshape(x, shape):
 
 class PRelu(Operation):
     """
-    PRelu applies the function f(x) = slope * x for x < 0, 
-    f(x) = x for x >= 0 to the data tensor elementwise.
+    PRelu applies the function `f(x) = slope * x` for x < 0, 
+    `f(x) = x` for x >= 0 to the data tensor elementwise.
     """
 
     def __init__(self):
@@ -828,8 +828,8 @@ class PRelu(Operation):
 
 def prelu(x, slope):
     """
-    PRelu applies the function f(x) = slope * x for x < 0, 
-    f(x) = x for x >= 0 to the data tensor elementwise.
+    PRelu applies the function `f(x) = slope * x` for x < 0, 
+    `f(x) = x` for x >= 0 to the data tensor elementwise.
     Args:
         x (Tensor): matrix.
     Return:
@@ -848,7 +848,7 @@ class Add(Operation):
 
     def forward(self, a, b):
         """
-        Return a+b, where a and b are CTensor.
+        Return `a+b`, where a and b are CTensor.
         """
         res = singa.__add__(a, b)
         if training:
@@ -877,14 +877,14 @@ class Add(Operation):
 
 def add(a, b):
     """
-    Return a+b, where a and b are Tensor.
+    Return `a+b`, where a and b are Tensor.
     """
     return Add()(a, b)[0]
 
 
 class Elu(Operation):
     """
-    f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to 
+    `f(x) = alpha * (exp(x) - 1.)` for x < 0, `f(x) = x` for x >= 0., is applied to 
     the tensor elementwise.
     """
     def __init__(self, alpha=1.):
@@ -932,7 +932,7 @@ class Elu(Operation):
 
 def elu(x, alpha=1):
     """
-    f(x) = alpha * (exp(x) - 1.) for x < 0, f(x) = x for x >= 0., is applied to 
+    `f(x) = alpha * (exp(x) - 1.)` for x < 0, `f(x) = x` for x >= 0., is applied to 
     the tensor elementwise.
     Args:
         x (Tensor): matrix
@@ -953,7 +953,7 @@ class Equal(Operation):
 
     def forward(self, x, y):
         """
-        Return a=b, where a and b are CTensor.
+        Return `a=b`, where a and b are CTensor.
         """
         m = singa.__sub__(x, y)
         cur = singa.__mul__(singa.GEFloat(m, 0), singa.LEFloat(m, 0))
@@ -971,14 +971,14 @@ class Equal(Operation):
 
 def equal(x, y):
     """
-    Return a=b, where a and b are Tensor.
+    Return `a=b`, where a and b are Tensor.
     """
     return Equal()(x, y)[0]
 
 
 class SeLU(Operation):
     """
-    y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0 
+    `y = gamma * (alpha * e^x - alpha)` for x <= 0, `y = gamma * x` for x > 0 
     is applied to the tensor elementwise.
     """
 
@@ -1032,7 +1032,7 @@ class SeLU(Operation):
 
 def selu(x, alpha=1.67326, gamma=1.0507):
     """
-    y = gamma * (alpha * e^x - alpha) for x <= 0, y = gamma * x for x > 0 
+    `y = gamma * (alpha * e^x - alpha)` for x <= 0, `y = gamma * x` for x > 0 
     is applied to the tensor elementwise.
     Args:
         x (Tensor): matrix
@@ -1248,8 +1248,8 @@ def ctensor2numpy(x):
 class Flatten(Operation):
     """
     Flattens the input tensor into a 2D matrix. If input tensor has shape 
-    (d_0, d_1, ... d_n) then the output will have shape (d_0 X d_1 ... 
-    d_(axis-1), d_axis X d_(axis+1) ... X dn).
+    `(d_0, d_1, ... d_n)` then the output will have shape `(d_0 X d_1 ... 
+    d_(axis-1), d_axis X d_(axis+1) ... X dn)`.
     """
 
     def __init__(self, axis=1):
@@ -1260,8 +1260,8 @@ class Flatten(Operation):
                 value for axis must be in the range [-r, r], where r is the 
                 rank of the input tensor. Negative value means counting 
                 dimensions from the back. When axis = 0, the shape of the 
-                output tensor is (1, (d_0 X d_1 ... d_n), where the shape 
-                of the input tensor is (d_0, d_1, ... d_n).
+                output tensor is `(1, (d_0 X d_1 ... d_n)`, where the shape 
+                of the input tensor is `(d_0, d_1, ... d_n)`.
         Returns:
             the result CTensor
         """
@@ -1303,8 +1303,8 @@ class Flatten(Operation):
 def flatten(x, axis=1):
     """
     Flattens the input tensor into a 2D matrix. If input tensor has shape 
-    (d_0, d_1, ... d_n) then the output will have shape (d_0 X d_1 ... 
-    d_(axis-1), d_axis X d_(axis+1) ... X dn).
+    `(d_0, d_1, ... d_n)` then the output will have shape `(d_0 X d_1 ... 
+    d_(axis-1), d_axis X d_(axis+1) ... X dn)`.
     Args:
         x (Tensor): the input tensor
         axis (int): Indicate up to which input dimensions (exclusive) 
@@ -1312,8 +1312,8 @@ def flatten(x, axis=1):
             value for axis must be in the range [-r, r], where r is the 
             rank of the input tensor. Negative value means counting 
             dimensions from the back. When axis = 0, the shape of the 
-            output tensor is (1, (d_0 X d_1 ... d_n), where the shape 
-            of the input tensor is (d_0, d_1, ... d_n).
+            output tensor is `(1, (d_0 X d_1 ... d_n)`, where the shape 
+            of the input tensor is `(d_0, d_1, ... d_n)`.
     Returns:
         the result Tensor
     """
@@ -2896,7 +2896,7 @@ def atanh(x):
 
 class Sigmoid(Operation):
     """
-    y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.
+    `y = 1 / (1 + exp(-x))`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -2930,7 +2930,7 @@ class Sigmoid(Operation):
 
 def sigmoid(x):
     """
-    y = 1 / (1 + exp(-x)), is applied to the tensor elementwise.
+    `y = 1 / (1 + exp(-x))`, is applied to the tensor elementwise.
     Args:
         x (Tensor): Input tensor
     Returns: 
@@ -2950,7 +2950,7 @@ class Mul(Operation):
 
     def forward(self, a, b):
         """
-        Return np.multiply(a,b), where a and b are CTensor.
+        Return `np.multiply(a,b)`, where a and b are CTensor.
         """
         # todo we cannot support mul op for int tensors
         _a, _b = a, b
@@ -2991,7 +2991,7 @@ class Mul(Operation):
 
 def mul(x, y):
     """
-    Return np.multiply(x,y), where a and b are Tensor.
+    Return `np.multiply(x,y)`, where a and b are Tensor.
     """
     return Mul()(x, y)[0]
 
@@ -3332,12 +3332,12 @@ class LSTM(RNN_Base):
 
 class Abs(Operation):
     """
-    y = abs(x), is applied to the tensor elementwise.
+    `y = abs(x)`, is applied to the tensor elementwise.
     """
 
     def forward(self, a):
         """
-        Return abs(a), where a is CTensor.
+        Return `abs(a)`, where a is CTensor.
         """
         if training:
             self.input = a
@@ -3364,12 +3364,12 @@ def abs(a):
 
 class Exp(Operation):
     """
-    y = exp(x), is applied to the tensor elementwise.
+    `y = exp(x)`, is applied to the tensor elementwise.
     """
 
     def forward(self, a):
         """
-        Return exp(a), where a is Tensor.
+        Return `exp(a)`, where a is Tensor.
         """
         if training:
             self.input = a
@@ -3389,14 +3389,14 @@ class Exp(Operation):
 
 def exp(a):
     """
-    Return exp(a), where a is Tensor.
+    Return `exp(a)`, where a is Tensor.
     """
     return Exp()(a)[0]
 
 
 class LeakyRelu(Operation):
     """
-    f(x) = alpha * x for x < 0, f(x) = x for x >= 0, is applied to the tensor elementwise.
+    `f(x) = alpha * x` for x < 0, `f(x) = x` for x >= 0, is applied to the tensor elementwise.
     """
 
     def __init__(self, a):
@@ -3441,7 +3441,7 @@ class LeakyRelu(Operation):
 
 def leakyrelu(x, a=0.01):
     """
-    f(x) = alpha * x for x < 0, f(x) = x for x >= 0 is applied to the tensor 
+    `f(x) = alpha * x` for x < 0, `f(x) = x` for x >= 0 is applied to the tensor 
     elementwise.
     Args:
         x (Tensor): Input tensor
@@ -3497,7 +3497,7 @@ def sign(a):
 
 class Pow(Operation):
     """
-    f(x) = a^b, is applied to the tensor elementwise.
+    `f(x) = a^b`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -3505,7 +3505,7 @@ class Pow(Operation):
 
     def forward(self, a, b):
         """
-        Return a^b, where a and b are CTensor.
+        Return `a^b`, where a and b are CTensor.
         """
         res = singa.Pow(a, b)
         if training:
@@ -3541,14 +3541,14 @@ class Pow(Operation):
 
 def pow(a, b):
     """
-    Return a^b, where a and b are Tensor.
+    Return `a^b`, where a and b are Tensor.
     """
     return Pow()(a, b)[0]
 
 
 class SoftSign(Operation):
     """
-    Calculates the softsign (x/(1+|x|)) of the given input tensor element-wise.
+    Calculates the softsign `(x/(1+|x|))` of the given input tensor element-wise.
     """
 
     def __init__(self):
@@ -3556,7 +3556,7 @@ class SoftSign(Operation):
 
     def forward(self, x):
         """
-        Return (x/(1+|x|)), where x is CTensor.
+        Return `(x/(1+|x|))`, where x is CTensor.
         """
         # y = x / (1 + np.abs(x))
         if training:
@@ -3581,14 +3581,14 @@ class SoftSign(Operation):
 
 def softsign(x):
     """
-    Return (x/(1+|x|)), where x is Tensor.
+    Return `(x/(1+|x|))`, where x is Tensor.
     """
     return SoftSign()(x)[0]
 
 
 class Sqrt(Operation):
     """
-    y = x^0.5, is applied to the tensor elementwise.
+    `y = x^0.5`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -3596,7 +3596,7 @@ class Sqrt(Operation):
 
     def forward(self, x):
         """
-        Return x^0.5, where x is CTensor.
+        Return `x^0.5`, where x is CTensor.
         """
         if training:
             self.input = x
@@ -3617,14 +3617,14 @@ class Sqrt(Operation):
 
 def sqrt(x):
     """
-    Return x^0.5, where x is Tensor.
+    Return `x^0.5`, where x is Tensor.
     """
     return Sqrt()(x)[0]
 
 
 class SoftPlus(Operation):
     """
-    y = ln(exp(x) + 1) is applied to the tensor elementwise.
+    `y = ln(exp(x) + 1)` is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -3632,7 +3632,7 @@ class SoftPlus(Operation):
 
     def forward(self, x):
         """
-        Return ln(exp(x) + 1), where x is CTensor.
+        Return `ln(exp(x) + 1)`, where x is CTensor.
         """
         #f(x) = ln(exp(x) + 1)
         if training:
@@ -3656,7 +3656,7 @@ class SoftPlus(Operation):
 
 def softplus(x):
     """
-    Return ln(exp(x) + 1), where x is Tensor.
+    Return `ln(exp(x) + 1)`, where x is Tensor.
     """
     return SoftPlus()(x)[0]
 
@@ -3672,7 +3672,7 @@ class Sub(Operation):
 
     def forward(self, a, b):
         """
-        Return a-b, where x is CTensor.
+        Return `a-b`, where x is CTensor.
         """
         res = singa.__sub__(a, b)
         if training:
@@ -3790,7 +3790,7 @@ def min(*l):
 
 class Log(Operation):
     """
-    y = log(x), is applied to the tensor elementwise.
+    `y = log(x)`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -3798,7 +3798,7 @@ class Log(Operation):
 
     def forward(self, x):
         """
-        Return log(x), where x is CTensor.
+        Return `log(x)`, where x is CTensor.
         """
         if training:
             self.input = x
@@ -3825,7 +3825,7 @@ def log(x):
 
 class HardSigmoid(Operation):
     """
-    y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.
+    `y = max(0, min(1, alpha * x + beta))`, is applied to the tensor elementwise.
     """
 
     def __init__(self, alpha=0.2, gamma=0.5):
@@ -3871,7 +3871,7 @@ class HardSigmoid(Operation):
 
 def hardsigmoid(x, alpha=0.2, gamma=0.5):
     """
-    y = max(0, min(1, alpha * x + beta)), is applied to the tensor elementwise.
+    `y = max(0, min(1, alpha * x + beta))`, is applied to the tensor elementwise.
     Args:
         x (Tensor): matrix
         alpha (float): Value of alpha.
@@ -3963,7 +3963,7 @@ class Div(Operation):
 
     def forward(self, a, b):
         """
-        Return np.div(a,b), where a and b are CTensor.
+        Return `np.div(a,b)`, where a and b are CTensor.
         """
         res = singa.__mul__(a, singa.PowFloat(b, -1.0))
         # res = singa.__div__(a, b)
@@ -3999,7 +3999,7 @@ class Div(Operation):
 
 def div(a, b):
     """
-    Return np.div(a,b), where a and b are Tensor.
+    Return `np.div(a,b)`, where a and b are Tensor.
     """
     return Div()(a, b)[0]
 
@@ -4029,6 +4029,8 @@ class Shape(Operation):
         """
         Args:
             dy (CTensor): the gradient tensor from upper operations
+        Returns: 
+            list of int, the shape of dy
         """
         return list(dy.shape())
 
@@ -4120,7 +4122,7 @@ def max(*l):
     Args:
         *x (a list of Tensor): List of tensors for max.
     Returns: 
-        CTensor, the output
+        Tensor, the output
     """
     return Max()(*l)[0]
 
@@ -4135,7 +4137,7 @@ class And(Operation):
 
     def forward(self, a, b):
         """
-        Return np.logical_and(a,b), where a and b are CTensor.
+        Return `np.logical_and(a,b)`, where a and b are CTensor.
         """
         m = singa.__mul__(a, b)
         cur = singa.PowFloat(singa.Sign(m), 2)
@@ -4154,7 +4156,7 @@ class And(Operation):
 
 def _and(a, b):
     """
-    Return np.logical_and(a,b), where a and b are Tensor.
+    Return `np.logical_and(a,b)`, where a and b are Tensor.
     """
     return And()(a, b)[0]
 
@@ -4169,7 +4171,7 @@ class Or(Operation):
 
     def forward(self, a, b):
         """
-        Return np.logical_or(a,b), where a and b are CTensor.
+        Return `np.logical_or(a,b)`, where a and b are CTensor.
         """
         m = singa.__add__(singa.PowFloat(singa.Sign(a), 2.0),
                           singa.PowFloat(singa.Sign(b), 2.0))
@@ -4180,7 +4182,7 @@ class Or(Operation):
     def backward(self, dy):
         """
         Args:
-            dy (CTensor): the gradient tensor from upper operations
+            dy (CTensor): data for the `dL / dy`, L is the loss.
         Raises:
             AssertionError: no backward function for this operator
         """
@@ -4204,7 +4206,7 @@ class Not(Operation):
 
     def forward(self, x):
         """
-        Return np.logical_not(x), where x is CTensor.
+        Return `np.logical_not(x)`, where x is CTensor.
         """
         mask0 = singa.GEFloat(x, 0)
         mask1 = singa.LEFloat(x, 0)
@@ -4224,7 +4226,7 @@ class Not(Operation):
 
 def _not(x):
     """
-    Return np.logical_not(x), where x is Tensor.
+    Return `np.logical_not(x)`, where x is Tensor.
     """
     return Not()(x)[0]
 
@@ -4239,7 +4241,7 @@ class Xor(Operation):
 
     def forward(self, a, b):
         """
-        Return np.logical_xor(a,b), where a and b are CTensor.
+        Return `np.logical_xor(a,b)`, where a and b are CTensor.
         """
         m = singa.__sub__(singa.PowFloat(singa.Sign(a), 2.0),
                           singa.PowFloat(singa.Sign(b), 2.0))
@@ -4259,14 +4261,14 @@ class Xor(Operation):
 
 def _xor(a, b):
     """
-    Return np.logical_xor(a,b), where a and b are Tensor.
+    Return `np.logical_xor(a,b)`, where a and b are Tensor.
     """
     return Xor()(a, b)[0]
 
 
 class Negative(Operation):
     """
-    y = -x, is applied to the tensor elementwise.
+    `y = -x`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -4274,7 +4276,7 @@ class Negative(Operation):
 
     def forward(self, x):
         """
-        Return -x, where x is CTensor.
+        Return `-x`, where x is CTensor.
         """
         #y=-x
         return singa.MultFloat(x, -1)
@@ -4291,14 +4293,14 @@ class Negative(Operation):
 
 def negative(x):
     """
-    Return -x, where x is Tensor.
+    Return `-x`, where x is Tensor.
     """
     return Negative()(x)[0]
 
 
 class Reciprocal(Operation):
     """
-    y = 1/x, is applied to the tensor elementwise.
+    `y = 1/x`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -4306,7 +4308,7 @@ class Reciprocal(Operation):
 
     def forward(self, x):
         """
-        Return 1/x, where x is CTensor.
+        Return `1/x`, where x is CTensor.
         """
         #y=1/x elementwise
         if training:
@@ -4335,12 +4337,12 @@ def reciprocal(x):
 
 class Gemm(Operation):
     """
-    Init a General Matrix multiplication(Gemm) operator. Compute Y = alpha * 
-    A' * B' + beta * C, where input tensor A has shape (M, K) or (K, M), input 
+    Init a General Matrix multiplication(Gemm) operator. Compute `Y = alpha * 
+    A' * B' + beta * C`, where input tensor A has shape (M, K) or (K, M), input 
     tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to 
     shape (M, N), and output tensor Y has shape (M, N).
-    A' = transpose(A) if transA else A
-    B' = transpose(B) if transB else B
+    `A' = transpose(A)` if transA else A
+    `B' = transpose(B)` if transB else B
     """
 
     def __init__(self, alpha=1.0, beta=1.0, transA=0, transB=0):
@@ -4421,12 +4423,12 @@ class Gemm(Operation):
 
 def gemm(A, B, C=None, alpha=1.0, beta=1.0, transA=0, transB=0):
     """
-    Init a General Matrix multiplication(Gemm) operator. Compute Y = alpha * 
-    A' * B' + beta * C, where input tensor A has shape (M, K) or (K, M), input 
+    Init a General Matrix multiplication(Gemm) operator. Compute `Y = alpha * 
+    A' * B' + beta * C`, where input tensor A has shape (M, K) or (K, M), input 
     tensor B has shape (K, N) or (N, K), input tensor C is broadcastable to 
     shape (M, N), and output tensor Y has shape (M, N).
-    A' = transpose(A) if transA else A
-    B' = transpose(B) if transB else B
+    `A' = transpose(A)` if transA else A
+    `B' = transpose(B)` if transB else B
     Args:
         A (Tensor): The shape of A should be (M, K) if transA is 0, or 
             (K, M) if transA is non-zero.
@@ -4586,7 +4588,7 @@ def constant_of_shape(x, value=0):
 class Dropout(Operation):
     """
     Init a Dropout, which scales the masked input data by the following equation:
-    output = scale * data * mask, scale = 1. / (1. - ratio).
+    `output = scale * data * mask`, `scale = 1. / (1. - ratio)`.
     """
 
     def __init__(self, ratio=0.5):
@@ -4628,7 +4630,7 @@ class Dropout(Operation):
 def dropout(x, ratio=0.5):
     """
     Init a Dropout, which scales the masked input data by the following 
-    equation: output = scale * data * mask, scale = 1. / (1. - ratio).
+    equation: `output = scale * data * mask`, `scale = 1. / (1. - ratio)`.
     Args:
         x (Tensor): input tensor.
         ratio (float): the ratio of random dropout, with value in [0, 1).
@@ -4901,7 +4903,7 @@ def slice(x, starts, ends, axes=None, steps=None):
 class Ceil(Operation):
     """
     Ceil takes one input data (Tensor) and produces one output data (Tensor) 
-    where the ceil is, y = ceil(x), is applied to the tensor elementwise.
+    where the ceil is, `y = ceil(x)`, is applied to the tensor elementwise.
     """
 
     def __init__(self):
@@ -4933,7 +4935,7 @@ class Ceil(Operation):
 def ceil(x):
     """
     Ceil takes one input data (Tensor) and produces one output data (Tensor) 
-    where the ceil is, y = ceil(x), is applied to the tensor elementwise.
+    where the ceil is, `y = ceil(x)`, is applied to the tensor elementwise.
     Args:
         x (Tensor): input tensor.
     Returns:
@@ -5023,7 +5025,7 @@ class Gather(Operation):
     """
     Init a Gather, Given data tensor of rank r >= 1, and indices tensor of 
     rank q, gather entries of the axis dimension of data (by default outer-most 
-    one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1).
+    one as axis=0) indexed by indices, and concatenates them in an output tensor of rank `q + (r - 1)`.
     """
 
     def __init__(self, axis, indices):
@@ -5123,7 +5125,7 @@ def gather(x, axis, indices):
     """
     Init a Gather, Given data tensor of rank r >= 1, and indices tensor of 
     rank q, gather entries of the axis dimension of data (by default outer-most 
-    one as axis=0) indexed by indices, and concatenates them in an output tensor of rank q + (r - 1).
+    one as axis=0) indexed by indices, and concatenates them in an output tensor of rank `q + (r - 1)`.
     Args:
         x (Tensor): input tensor.
         axis (int): which axis to slice on. A negative value means counting