You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by du...@apache.org on 2017/06/15 22:22:39 UTC

[1/3] systemml git commit: [SYSTEMML-1686] Fix 2D transpose convolution filter shape & input size

Repository: systemml
Updated Branches:
  refs/heads/master 009561384 -> 17838a3d3


[SYSTEMML-1686] Fix 2D transpose convolution filter shape & input size

Currently, the transpose conv2d layer (`nn/layers/conv2d_tranpose.dml`)
has a bug in which the filters tensor `W` has an incorrect shape, and
the `conv2d_backward_data` op has an incorrect input shape argument.
This results in an exception when the number of input channels `C` is
not equal to the number of filters `F` (i.e. number of output channels).
Since the transpose conv2d op is the gradient of the conv2d op, the
filter tensor needs to have the shape `(C, F, Hf, Wf)` for `F` filters,
rather than `(F, C, Hf, Wf)`, in order to map from an input with `C`
channels to an output with `F` channels during the input data gradient
function (`conv2d_backward_data`) that is used in the forward pass.
Additionally, the `input_shape` argument for `conv2d_backward_data`
needs to be `(N, F, Hout, Wout)`, rather than `(N, C, Hout, Wout)` in
order to map from an input with `C` channels to an output with `F`
channels.  Our previous test cases did not catch this issue because
the tests used `C = F = 1`.

The motivation here is to think about the shape of `W` during a regular
2D conv function, i.e. `(F, C, Hf, Wf)`.  In the forward *and* backward
pass of regular 2D conv, `W` maintains the same `(F, C, Hf, Wf)`shape.
Specifically, in the backward pass, the data gradient function
`conv2d_backward_data` accepts the same `W` of shape `(F, C, Hf, Wf)`
to map from gradients wrt the output of shape `(N, F, Hout, Wout)` to
gradients wrt to the input of shape `(N, C, Hin, Win)`.  In a 2D
transpose conv function, we use the 2D conv data gradient function as
the forward function.  If we use the same size terminology as from
regular 2D conv, we would be mapping from an input with `F` channels to
an output with `C` channels (which seems backward, but is correct since
the forward pass of 2D transpose conv is defined as the backward pass of
2D conv), and we would be using the same filter `W` of shape
`(F, C, Hf, Wf)`.  Now, in order to make the terminology of the 2D
transpose conv layer consistent with the rest of the library, we would
need to swap the variable names `F` and `C` so that we are mapping from
an input with `C` channels to an output with `F` channels.  Thus, we
would use a filter `W` of shape `(C, F, Hf, Wf`).

This commit fixes these issues, and improves the robustness of the
associated tests.

Closes #541.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/324bea5d
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/324bea5d
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/324bea5d

Branch: refs/heads/master
Commit: 324bea5d95c0ce4c7fe98e348b2a98ca0e888d84
Parents: 0095613
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Thu Jun 15 15:20:37 2017 -0700
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Thu Jun 15 15:20:37 2017 -0700

----------------------------------------------------------------------
 scripts/nn/layers/conv2d_builtin.dml   |  2 +-
 scripts/nn/layers/conv2d_transpose.dml | 32 ++++++-------
 scripts/nn/test/grad_check.dml         | 70 ++++++++++++-----------------
 scripts/nn/test/test.dml               | 63 +++++++++++++++++---------
 4 files changed, 88 insertions(+), 79 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/324bea5d/scripts/nn/layers/conv2d_builtin.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/conv2d_builtin.dml b/scripts/nn/layers/conv2d_builtin.dml
index bda7a9c..6b066eb 100644
--- a/scripts/nn/layers/conv2d_builtin.dml
+++ b/scripts/nn/layers/conv2d_builtin.dml
@@ -123,7 +123,7 @@ backward = function(matrix[double] dout, int Hout, int Wout,
   # Partial derivatives for convolution - built-in implementation
   dW = conv2d_backward_filter(X, dout, stride=[strideh,stridew], padding=[padh,padw],
                               input_shape=[N,C,Hin,Win], filter_shape=[F,C,Hf,Wf])
-  dX = conv2d_backward_data(W, dout, stride=[strideh, stridew], padding=[padh,padw],
+  dX = conv2d_backward_data(W, dout, stride=[strideh,stridew], padding=[padh,padw],
                             input_shape=[N,C,Hin,Win], filter_shape=[F,C,Hf,Wf])
 
   # Partial derivatives for bias vector

http://git-wip-us.apache.org/repos/asf/systemml/blob/324bea5d/scripts/nn/layers/conv2d_transpose.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/conv2d_transpose.dml b/scripts/nn/layers/conv2d_transpose.dml
index ecd7a1c..0838563 100644
--- a/scripts/nn/layers/conv2d_transpose.dml
+++ b/scripts/nn/layers/conv2d_transpose.dml
@@ -24,7 +24,7 @@
  *
  * Utilizes built-in convolution operators for higher performance.
  */
- 
+
 forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
                    int C, int Hin, int Win, int Hf, int Wf,
                    int strideh, int stridew, int padh, int padw,
@@ -37,7 +37,7 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
    *
    * Inputs:
    *  - X: Inputs, of shape (N, C*Hin*Win).
-   *  - W: Weights, of shape (F, C*Hf*Wf).
+   *  - W: Weights, of shape (C, F*Hf*Wf).
    *  - b: Biases, of shape (F, 1).
    *  - C: Number of input channels (dimensionality of depth).
    *  - Hin: Input height.
@@ -48,7 +48,7 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
    *  - stridew: Stride over width.
    *  - padh: Padding for top and bottom sides.
    *  - padw: Padding for left and right sides.
-   *  - out_padh: extra padding for top side. This should 
+   *  - out_padh: extra padding for top side. This should
    *      lie in [0, strideh-1].
    *  - out_padw: extra padding for right side. This should
    *      lie in [0, stridew-1].
@@ -59,7 +59,7 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
    *  - Wout: Output width.
    */
   N = nrow(X)
-  F = nrow(W)
+  F = nrow(b)
   Hout = strideh * (Hin-1) - 2*padh + Hf + out_padh
   Wout = stridew * (Win-1) - 2*padw + Wf + out_padw
 
@@ -80,7 +80,7 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
    * convolution.
    */
   out = conv2d_backward_data(W, X, stride=[strideh,stridew], padding=[padh,padw],
-                             input_shape=[N,C,Hout,Wout], filter_shape=[F,C,Hf,Wf])
+                             input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf])
 
   out = bias_add(out, b)
 }
@@ -100,7 +100,7 @@ backward = function(matrix[double] dout, int Hout, int Wout,
    *  - Hout: Output height.
    *  - Wout: Output width.
    *  - X: Inputs, of shape (N, C*Hin*Win).
-   *  - W: Weights, of shape (F, C*Hf*Wf).
+   *  - W: Weights, of shape (C, F*Hf*Wf).
    *  - b: Biases, of shape (F, 1).
    *  - C: Number of input channels (dimensionality of depth).
    *  - Hin: Input height.
@@ -114,15 +114,15 @@ backward = function(matrix[double] dout, int Hout, int Wout,
    *
    * Outputs:
    *  - dX: Gradient wrt `X`, of shape (N, C*Hin*Win).
-   *  - dW: Gradient wrt `W`, of shape (F, C*Hf*Wf).
+   *  - dW: Gradient wrt `W`, of shape (C, F*Hf*Wf).
    *  - db: Gradient wrt `b`, of shape (F, 1).
    */
   N = nrow(X)
-  F = nrow(W)
+  F = nrow(b)
 
   /*
    * conv2d_backward_filter takes the input and delta map as first and
-   * second args, respectively. Given that, we need to compute the
+   * second args, respectively. Given that we need to compute the
    * grad (wrt to filter) for transpose convolution where the roles of
    * the input and output are reversed, we reverse the order of the
    * args (along with setting input_shape to the delta map shape).
@@ -134,7 +134,7 @@ backward = function(matrix[double] dout, int Hout, int Wout,
    * applying it.
    */
   dW = conv2d_backward_filter(dout, X, stride=[strideh,stridew], padding=[padh,padw],
-                              input_shape=[N,C,Hout,Wout], filter_shape=[F,C,Hf,Wf])
+                              input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf])
 
   /*
    * Since the forward for transpose convolution makes a call to
@@ -147,7 +147,7 @@ backward = function(matrix[double] dout, int Hout, int Wout,
    * and the filter, keep in mind that the forward function rotates the
    * filter by 180 degrees before applying it.
    */
-  dX = conv2d(dout, W, input_shape=[N,C,Hout,Wout], filter_shape=[F,C,Hf,Wf],
+  dX = conv2d(dout, W, input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf],
               stride=[strideh,stridew], padding=[padh,padw])
 
   db = rowSums(matrix(colSums(dout), rows=F, cols=Hout*Wout))
@@ -171,10 +171,10 @@ init = function(int F, int C, int Hf, int Wf)
    *  - Wf: Filter width.
    *
    * Outputs:
-   *  - W: Weights, of shape (F, C*Hf*Wf).
+   *  - W: Weights, of shape (C, F*Hf*Wf).
    *  - b: Biases, of shape (F, 1).
    */
-  W = rand(rows=F, cols=C*Hf*Wf, pdf="normal") * sqrt(2/(C*Hf*Wf))
+  W = rand(rows=C, cols=F*Hf*Wf, pdf="normal") * sqrt(2/(C*Hf*Wf))
   b = matrix(0, rows=F, cols=1)
 }
 
@@ -187,7 +187,7 @@ init_bilinear = function(int C, int K)
    * channel-wise independent kernels of size K = 2f - f%2,
    * stride = f and pad = ceil((f-1)/2). The weights are set
    * via bilinear interpolation, bias is set to 0.
-   * 
+   *
    * Inputs:
    *  - C: Number of input channels (dimensionality of depth).
    *  - K: Kernel size (upsampling requires a square filter
@@ -202,7 +202,7 @@ init_bilinear = function(int C, int K)
   vect = 1 - abs(seq(0, K-1) / factor_up - center)
   weights = matrix(vect %*% t(vect), rows=1, cols=K*K)
 
-  /* 
+  /*
    * To create a multi-channel channel-independent upsampling filter,
    * we need to intersperse the filter weights with 0s. For instance,
    * consider the case of 2X upsampling. In this case, K=4 and we have
@@ -232,7 +232,7 @@ init_bilinear = function(int C, int K)
      * resulting row C times
      */
     repl_weights = matrix(1, rows=C, cols=1) %*% cbind(weights, matrix(0, rows=1, cols=C*K*K))
-    
+
     /*
      * The above operation added extra C*K*K trailing 0s in the last row
      * that we do not need. Thus, we need to:

http://git-wip-us.apache.org/repos/asf/systemml/blob/324bea5d/scripts/nn/test/grad_check.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/test/grad_check.dml b/scripts/nn/test/grad_check.dml
index 5e57081..48e470c 100644
--- a/scripts/nn/test/grad_check.dml
+++ b/scripts/nn/test/grad_check.dml
@@ -619,57 +619,53 @@ conv2d_simple = function() {
 
 conv2d_transpose = function() {
   /*
-   * Gradient check for the 2D convolution transpose layer.
+   * Gradient check for the 2D transpose convolutional layer.
    */
-  print("Grad checking the 2D convolution transpose layer with L2 loss.")
-
-  N = 2
-  C = 2
-  Hin = 3
-  Win = 3
-  F = 2
-  Hf = 3
-  Wf = 3
+  print("Grad checking the 2D transpose convolutional layer with L2 loss.")
+
+  # Generate data
+  N = 2  # num examples
+  C = 2  # num channels
+  Hin = 3  # input height
+  Win = 3  # input width
+  F = 2  # num filters
+  Hf = 3  # filter height
+  Wf = 3  # filter width
   stride = 2
   pad = 1
   out_pad = 1
-
   X = rand(rows=N, cols=C*Hin*Win)
 
-  [W,b] = conv2d_transpose::init(F, C, Hf, Wf)
-  
+  # Create layers
+  [W, b] = conv2d_transpose::init(F, C, Hf, Wf)
+
+  # Compute analytical gradients of loss wrt parameters
   [out, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                 pad, pad, out_pad, out_pad)
-  
   y = rand(rows=N, cols=F*Hout*Wout)
-  
   dout = l2_loss::backward(out,y)
-  
   [dX, dW, db] = conv2d_transpose::backward(dout, Hout, Wout, X, W, b, C, Hin, Win, Hf, Wf,
                                             stride, stride, pad, pad)
-  
+
+  # Grad check
   h = 1e-5
   print(" - Grad checking X.")
   for (i in 1:nrow(X)) {
     for (j in 1:ncol(X)) {
+      # Compute numerical derivative
       old = as.scalar(X[i,j])
       X[i,j] = old - h
-
       [outmh, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-
       lossmh = l2_loss::forward(outmh, y)
-	
       X[i,j] = old + h
       [outph, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-
       lossph = l2_loss::forward(outph, y)
+      X[i,j] = old  # reset
+      dX_num = (lossph-lossmh) / (2*h)  # numerical derivative
 
-      X[i,j] = old
-
-      dX_num = (lossph-lossmh) / (2*h)
-
+      # Check error
       rel_error = test_util::check_rel_grad_error(as.scalar(dX[i,j]), dX_num, lossph, lossmh)
     }
   }
@@ -677,24 +673,20 @@ conv2d_transpose = function() {
   print(" - Grad checking W.")
   for (i in 1:nrow(W)) {
     for (j in 1:ncol(W)) {
+      # Compute numerical derivative
       old = as.scalar(W[i,j])
       W[i,j] = old - h
-
       [outmh, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-      
       lossmh = l2_loss::forward(outmh, y)
-
       W[i,j] = old + h
       [outph, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-      
       lossph = l2_loss::forward(outph, y)
+      W[i,j] = old  # reset
+      dW_num = (lossph-lossmh) / (2*h)  # numerical derivative
 
-      W[i,j] = old
-
-      dW_num = (lossph-lossmh) / (2*h)
-
+      # Check error
       rel_error = test_util::check_rel_grad_error(as.scalar(dW[i,j]), dW_num, lossph, lossmh)
     }
   }
@@ -702,24 +694,20 @@ conv2d_transpose = function() {
   print(" - Grad checking b.")
   for (i in 1:nrow(b)) {
     for (j in 1:ncol(b)) {
+      # Compute numerical derivative
       old = as.scalar(b[i,j])
       b[i,j] = old - h
-
       [outmh, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-
       lossmh = l2_loss::forward(outmh, y)
-
       b[i,j] = old + h
       [outph, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
                                                       pad, pad, out_pad, out_pad)
-	
       lossph = l2_loss::forward(outph, y)
+      b[i,j] = old  # reset
+      db_num = (lossph-lossmh) / (2*h)  # numerical derivative
 
-      b[i,j] = old
-	
-      db_num = (lossph-lossmh) / (2*h)
-
+      # Check error
       rel_error = test_util::check_rel_grad_error(as.scalar(db[i,j]), db_num, lossph, lossmh)
     }
   }

http://git-wip-us.apache.org/repos/asf/systemml/blob/324bea5d/scripts/nn/test/test.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/test/test.dml b/scripts/nn/test/test.dml
index 0df7f0f..52fb063 100644
--- a/scripts/nn/test/test.dml
+++ b/scripts/nn/test/test.dml
@@ -111,36 +111,57 @@ conv2d = function() {
 
 conv2d_transpose = function() {
   /*
-   * Test for the 2D convolution transpose function.
+   * Test for the 2D transpose convolution function.
    */
-  print("Testing the 2D convolution transpose function.")
-
-  N = 1  
-  C = 1
-  Hin = 2
-  Win = 2
-  F = 1
-  Hf = 3
-  Wf = 3
+  print("Testing the 2D transpose convolution function.")
+
+  # Generate data
+  N = 2  # num examples
+  C = 3  # num channels
+  Hin = 2  # input height
+  Win = 2  # input width
+  F = 2  # num filters
+  Hf = 3  # filter height
+  Wf = 3  # filter width
   stride = 1
   pad = 0
-  out_pad = 0
-  
-  X = matrix(seq(1,N*C*Hin*Win), rows=N, cols=C*Hin*Win)
-  W = matrix(seq(1,F*C*Hf*Wf), rows=F, cols=C*Hf*Wf)
-  b = matrix(0, rows=F, cols=1)
+  out_pad = 0  # padding added to output
+  X = matrix(seq(1,N*C*Hin*Win), rows=N, cols=C*Hin*Win) / (N*C*Hin*Win) * 2 - 1  # normalized
+
+  # Create layer
+  W = matrix(seq(1,C*F*Hf*Wf), rows=C, cols=F*Hf*Wf) / (C*F*Hf*Wf) * 2 - 1  # normalized
+  b = matrix(seq(1,F), rows=F, cols=1) / F^2  # non-zero & non-one
 
   # Forward
-  [out, Hout, Wout] = 
-  	conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride, pad, pad, out_pad, out_pad)
+  [out, Hout, Wout] = conv2d_transpose::forward(X, W, b, C, Hin, Win, Hf, Wf, stride, stride,
+                                                pad, pad, out_pad, out_pad)
 
   # Equivalency check
-  target = matrix("1 4 7 6 7 23 33 24 19 53 63 42 21 52 59 36", rows=N, cols=C*Hout*Wout)
-  
+  target = matrix("1.21296299  2.03703713  1.91666663  1.02777779
+                   1.83333337  3.18518519  2.98148131  1.52777767
+                   1.5         2.57407403  2.37037039  1.24999988
+                   0.78703707  1.25925922  1.17592585  0.69444442
+
+                   0.87962961  1.20370364  1.08333337  0.77777773
+                   1.08333337  1.60185182  1.39814818  0.94444442
+                   0.75        0.99074072  0.78703701  0.66666657
+                   0.62037039  0.75925928  0.67592591  0.6111111
+
+
+                   0.32407406  0.37037039  0.47222221  0.36111113
+                   0.38888881  0.51851851  0.75925928  0.52777779
+                   0.72222215  1.24074078  1.48148155  0.91666669
+                   0.56481475  0.92592585  1.06481469  0.69444442
+
+                   0.99074078  1.53703713  1.63888896  1.11111116
+                   1.63888884  2.93518519  3.17592597  1.94444442
+                   1.97222221  3.65740728  3.89814806  2.33333325
+                   1.39814818  2.42592597  2.56481481  1.61111116", rows=N, cols=F*Hout*Wout)
+
   for (i in 1:nrow(out)) {
     for(j in 1:ncol(out)) {
-      rel_error = test_util::check_rel_error(as.scalar(out[1,i]),
-                                             as.scalar(target[1,i]), 1e-3, 1e-4)
+      rel_error = test_util::check_rel_error(as.scalar(out[i,j]),
+                                             as.scalar(target[i,j]), 1e-3, 1e-4)
     }
   }
 }


[2/3] systemml git commit: [MINOR] Update `nn` library formatting

Posted by du...@apache.org.
[MINOR] Update `nn` library formatting


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/d49ab981
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/d49ab981
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/d49ab981

Branch: refs/heads/master
Commit: d49ab98116054c98aa65a1f0d173a9c181b1f32f
Parents: 324bea5
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Thu Jun 15 15:20:47 2017 -0700
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Thu Jun 15 15:20:47 2017 -0700

----------------------------------------------------------------------
 scripts/nn/examples/mnist_lenet.dml    |   4 +-
 scripts/nn/layers/conv2d_transpose.dml | 149 +++++++++++++---------------
 2 files changed, 71 insertions(+), 82 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/d49ab981/scripts/nn/examples/mnist_lenet.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/examples/mnist_lenet.dml b/scripts/nn/examples/mnist_lenet.dml
index 986d797..57b8ba6 100644
--- a/scripts/nn/examples/mnist_lenet.dml
+++ b/scripts/nn/examples/mnist_lenet.dml
@@ -114,8 +114,8 @@ train = function(matrix[double] X, matrix[double] Y,
 
       # Compute forward pass
       ## layer 1: conv1 -> relu1 -> pool1
-      [outc1, Houtc1, Woutc1] = conv2d::forward(X_batch, W1, b1, C, Hin, Win, Hf, Wf, stride, stride,
-                                                pad, pad)
+      [outc1, Houtc1, Woutc1] = conv2d::forward(X_batch, W1, b1, C, Hin, Win, Hf, Wf,
+                                                stride, stride, pad, pad)
       outr1 = relu::forward(outc1)
       [outp1, Houtp1, Woutp1] = max_pool2d::forward(outr1, F1, Houtc1, Woutc1, Hf=2, Wf=2,
                                                     strideh=2, stridew=2, pad=0, pad=0)

http://git-wip-us.apache.org/repos/asf/systemml/blob/d49ab981/scripts/nn/layers/conv2d_transpose.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/conv2d_transpose.dml b/scripts/nn/layers/conv2d_transpose.dml
index 0838563..eee19a5 100644
--- a/scripts/nn/layers/conv2d_transpose.dml
+++ b/scripts/nn/layers/conv2d_transpose.dml
@@ -63,25 +63,24 @@ forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
   Hout = strideh * (Hin-1) - 2*padh + Hf + out_padh
   Wout = stridew * (Win-1) - 2*padw + Wf + out_padw
 
-  /*
-   * Transpose convolution aims to go in the other direction of
-   * (direct) convolution, i.e., given input X, produce output O such
-   * that running convolution on O recovers X. This is achieved by
-   * conv2d_backward_data (since the derivative wrt data must produce
-   * output of same size as the input to conv2d). By reusing a built-in
-   * operator we achieve efficiency and restrict the number of built-in
-   * operators to manageable levels. Plus, most other deep-learning
-   * packages make use of the same strategy which means this
-   * implementation of transpose convolution is 'in-sync' with them.
-   *
-   * One potential downside of reusing conv2d_backward_data is the fact
-   * that it rotates the filter by 180 degrees before applying it. This
-   * needs to be kept in mind when interpreting the output of transpose
-   * convolution.
-   */
+  # Transpose convolution aims to go in the other direction of
+  # (direct) convolution, i.e., given input X, produce output O such
+  # that running convolution on O recovers X. This is achieved by
+  # conv2d_backward_data (since the derivative wrt data must produce
+  # output of same size as the input to conv2d). By reusing a built-in
+  # operator we achieve efficiency and restrict the number of built-in
+  # operators to manageable levels. Plus, most other deep-learning
+  # packages make use of the same strategy which means this
+  # implementation of transpose convolution is 'in-sync' with them.
+  #
+  # One potential downside of reusing conv2d_backward_data is the fact
+  # that it rotates the filter by 180 degrees before applying it. This
+  # needs to be kept in mind when interpreting the output of transpose
+  # convolution.
   out = conv2d_backward_data(W, X, stride=[strideh,stridew], padding=[padh,padw],
                              input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf])
 
+  # Add bias term to each output filter
   out = bias_add(out, b)
 }
 
@@ -120,36 +119,33 @@ backward = function(matrix[double] dout, int Hout, int Wout,
   N = nrow(X)
   F = nrow(b)
 
-  /*
-   * conv2d_backward_filter takes the input and delta map as first and
-   * second args, respectively. Given that we need to compute the
-   * grad (wrt to filter) for transpose convolution where the roles of
-   * the input and output are reversed, we reverse the order of the
-   * args (along with setting input_shape to the delta map shape).
-   * Effectively, we are running a direct convolution with X as the
-   * filter and the dout as the input. To convince oneself that the
-   * interconnections between the cells of the filter, input and delta
-   * map are preserved please keep in mind that the forward of
-   * convolution transpose rotates the filter by 180 degrees before
-   * applying it.
-   */
+  # conv2d_backward_filter takes the input and delta map as first and
+  # second args, respectively. Given that we need to compute the
+  # grad (wrt to filter) for transpose convolution where the roles of
+  # the input and output are reversed, we reverse the order of the
+  # args (along with setting input_shape to the delta map shape).
+  # Effectively, we are running a direct convolution with X as the
+  # filter and the dout as the input. To convince oneself that the
+  # interconnections between the cells of the filter, input and delta
+  # map are preserved please keep in mind that the forward of
+  # convolution transpose rotates the filter by 180 degrees before
+  # applying it.
   dW = conv2d_backward_filter(dout, X, stride=[strideh,stridew], padding=[padh,padw],
                               input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf])
 
-  /*
-   * Since the forward for transpose convolution makes a call to
-   * conv2d_backward_data, to compute its derivative wrt to data
-   * we can run conv2d by applying the filter on the delta
-   * map (this makes sense because convolution transpose is the
-   * 'reverse' of convolution). Its easy to see that this will produce
-   * output of the required size. To convince oneself that conv2d will
-   * respect the interconnections between the cells in the delta map
-   * and the filter, keep in mind that the forward function rotates the
-   * filter by 180 degrees before applying it.
-   */
+  # Since the forward for transpose convolution makes a call to
+  # conv2d_backward_data, to compute its derivative wrt to data
+  # we can run conv2d by applying the filter on the delta
+  # map (this makes sense because convolution transpose is the
+  # 'reverse' of convolution). Its easy to see that this will produce
+  # output of the required size. To convince oneself that conv2d will
+  # respect the interconnections between the cells in the delta map
+  # and the filter, keep in mind that the forward function rotates the
+  # filter by 180 degrees before applying it.
   dX = conv2d(dout, W, input_shape=[N,F,Hout,Wout], filter_shape=[C,F,Hf,Wf],
               stride=[strideh,stridew], padding=[padh,padw])
 
+  # Partial derivatives for bias vector
   db = rowSums(matrix(colSums(dout), rows=F, cols=Hout*Wout))
 }
 
@@ -202,47 +198,40 @@ init_bilinear = function(int C, int K)
   vect = 1 - abs(seq(0, K-1) / factor_up - center)
   weights = matrix(vect %*% t(vect), rows=1, cols=K*K)
 
-  /*
-   * To create a multi-channel channel-independent upsampling filter,
-   * we need to intersperse the filter weights with 0s. For instance,
-   * consider the case of 2X upsampling. In this case, K=4 and we have
-   * K^2=16 weights to include into the 3D tensor representing the
-   * filter which should look like the following (assuming 3 channels):
-   *
-   *   <-16 weights-> <---------32 0s--------->
-   *   X X ...... X X 0 0 0 ............. 0 0 0
-   *   0 .......... 0 X X .... X X 0 ...... 0 0
-   *   0 0 0 ............... 0 0 0 X X .... X X
-   *
-   * To be clear, the second row should have 16 0s followed by 16
-   * weights followed by 16 0s.
-   *
-   * To create the above filter, we take advantage of the fact that
-   * between two sets of non-zero weights, there is always a sequence
-   * of C*K*K 0s. In the above example, C*K^2 = 48 (e.g., 32 trailing
-   * 0s in the first row and 16 leading 0s in the second row).
-   *
-   * Note that, in the special case of C=1 we do not need to
-   * intersperse with 0s (no question of being channel-wise independent
-   * since we have only 1 channel).
-   */
-  #if(C > 1){
-    /*
-     * Append C*K*K trailing 0s to the K*K kernel and replicate the
-     * resulting row C times
-     */
-    repl_weights = matrix(1, rows=C, cols=1) %*% cbind(weights, matrix(0, rows=1, cols=C*K*K))
+  # To create a multi-channel channel-independent upsampling filter,
+  # we need to intersperse the filter weights with 0s. For instance,
+  # consider the case of 2X upsampling. In this case, K=4 and we have
+  # K^2=16 weights to include into the 3D tensor representing the
+  # filter which should look like the following (assuming 3 channels):
+  #
+  #   <-16 weights-> <---------32 0s--------->
+  #   X X ...... X X 0 0 0 ............. 0 0 0
+  #   0 .......... 0 X X .... X X 0 ...... 0 0
+  #   0 0 0 ............... 0 0 0 X X .... X X
+  #
+  # To be clear, the second row should have 16 0s followed by 16
+  # weights followed by 16 0s.
+  #
+  # To create the above filter, we take advantage of the fact that
+  # between two sets of non-zero weights, there is always a sequence
+  # of C*K*K 0s. In the above example, C*K^2 = 48 (e.g., 32 trailing
+  # 0s in the first row and 16 leading 0s in the second row).
+  #
+  # Note that, in the special case of C=1 we do not need to
+  # intersperse with 0s (no question of being channel-wise independent
+  # since we have only 1 channel).
+
+  # Append C*K*K trailing 0s to the K*K kernel and replicate the
+  # resulting row C times
+  repl_weights = matrix(1, rows=C, cols=1) %*% cbind(weights, matrix(0, rows=1, cols=C*K*K))
 
-    /*
-     * The above operation added extra C*K*K trailing 0s in the last row
-     * that we do not need. Thus, we need to:
-     *   1) reshape the resulting matrix into a row
-     *   2) 'Clip off' the last few 0s using indexing and reshape the
-     *      result into the expected filter shape ([C, C, K, K])
-     */
-    repl_weights_row = matrix(repl_weights, rows=1, cols=C*(C+1)*K^2)
-    W = matrix(repl_weights_row[1,1:(C*K)^2], rows=C, cols=C*K^2)
-  #}else W = weights
+  # The above operation added extra C*K*K trailing 0s in the last row
+  # that we do not need. Thus, we need to:
+  #   1) reshape the resulting matrix into a row
+  #   2) 'Clip off' the last few 0s using indexing and reshape the
+  #      result into the expected filter shape ([C, C, K, K])
+  repl_weights_row = matrix(repl_weights, rows=1, cols=C*(C+1)*K^2)
+  W = matrix(repl_weights_row[1,1:(C*K)^2], rows=C, cols=C*K^2)
 
   b = matrix(0, rows=C, cols=1)
 }


[3/3] systemml git commit: [MINOR] Make use of util::channel_sums function in conv2d_builtin

Posted by du...@apache.org.
[MINOR] Make use of util::channel_sums function in conv2d_builtin


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/17838a3d
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/17838a3d
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/17838a3d

Branch: refs/heads/master
Commit: 17838a3d34d6e3fa860222ff9af2e5c17f6c3a77
Parents: d49ab98
Author: Mike Dusenberry <mw...@us.ibm.com>
Authored: Thu Jun 15 15:20:49 2017 -0700
Committer: Mike Dusenberry <mw...@us.ibm.com>
Committed: Thu Jun 15 15:20:49 2017 -0700

----------------------------------------------------------------------
 scripts/nn/layers/conv2d_builtin.dml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/17838a3d/scripts/nn/layers/conv2d_builtin.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/conv2d_builtin.dml b/scripts/nn/layers/conv2d_builtin.dml
index 6b066eb..141890e 100644
--- a/scripts/nn/layers/conv2d_builtin.dml
+++ b/scripts/nn/layers/conv2d_builtin.dml
@@ -24,6 +24,7 @@
  *
  * This implementation uses a built-in operator for higher performance.
  */
+source("nn/util.dml") as util
 
 forward = function(matrix[double] X, matrix[double] W, matrix[double] b,
                    int C, int Hin, int Win, int Hf, int Wf,
@@ -127,7 +128,7 @@ backward = function(matrix[double] dout, int Hout, int Wout,
                             input_shape=[N,C,Hin,Win], filter_shape=[F,C,Hf,Wf])
 
   # Partial derivatives for bias vector
-  db = rowSums(matrix(colSums(dout), rows=F, cols=Hout*Wout))
+  db = util::channel_sums(dout, F, Hout, Wout)
 }
 
 init = function(int F, int C, int Hf, int Wf)