You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ni...@apache.org on 2017/09/28 17:57:13 UTC
[2/2] systemml git commit: [SYSTEMML-1493] [SYSTEMML-1500] Added TanH
and Euclidean Loss in Caffe2DML
[SYSTEMML-1493] [SYSTEMML-1500] Added TanH and Euclidean Loss in Caffe2DML
- Added the reference documentation
- Added a new layer softmax_loss.dml
- Added compute_loss_accuracy to l2_loss.dml
- Updated tanh layer to invoke newly added builtin function
Closes #672.
Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/61dcc85e
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/61dcc85e
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/61dcc85e
Branch: refs/heads/master
Commit: 61dcc85e48a390c1bb63ee4c42aad9a3fade7d06
Parents: b5ef21f
Author: Niketan Pansare <np...@us.ibm.com>
Authored: Thu Sep 28 10:54:56 2017 -0700
Committer: Niketan Pansare <np...@us.ibm.com>
Committed: Thu Sep 28 10:56:03 2017 -0700
----------------------------------------------------------------------
docs/beginners-guide-caffe2dml.md | 539 +---------
docs/index.md | 4 +-
docs/reference-guide-caffe2dml.md | 986 +++++++++++++++++++
scripts/nn/layers/l2_loss.dml | 1 -
scripts/nn/layers/tanh.dml | 15 +-
scripts/nn/test/run_tests.dml | 2 +
scripts/nn/test/test.dml | 53 +
.../org/apache/sysml/api/dl/CaffeLayer.scala | 131 ++-
.../org/apache/sysml/api/dl/CaffeNetwork.scala | 3 +
9 files changed, 1171 insertions(+), 563 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/beginners-guide-caffe2dml.md
----------------------------------------------------------------------
diff --git a/docs/beginners-guide-caffe2dml.md b/docs/beginners-guide-caffe2dml.md
index 220e02c..4d6b7fd 100644
--- a/docs/beginners-guide-caffe2dml.md
+++ b/docs/beginners-guide-caffe2dml.md
@@ -27,12 +27,10 @@ limitations under the License.
<br/>
-## Introduction
-
-Caffe2DML is an **experimental API** that converts an Caffe specification to DML.
+Caffe2DML is an **experimental API** that converts a Caffe specification to DML.
It is designed to fit well into the mllearn framework and hence supports NumPy, Pandas as well as PySpark DataFrame.
-### Training Lenet
+# Training Lenet
To create a Caffe2DML object, one needs to create a solver and network file that conforms
to the [Caffe specification](http://caffe.berkeleyvision.org/).
@@ -148,7 +146,7 @@ Iter:2000, validation loss:173.66147359346, validation accuracy:97.4897540983606
0.97399999999999998
```
-### Additional Configuration
+# Additional Configuration
- Print the generated DML script along with classification report: `lenet.set(debug=True)`
- Print the heavy hitters instruction and the execution plan (advanced users): `lenet.setStatistics(True).setExplain(True)`
@@ -168,7 +166,7 @@ and `allreduce`). Here are some common settings:
| Distributed prediction | `lenet.set(test_algo="allreduce")` | |
| Distributed synchronous training | `lenet.set(train_algo="allreduce_parallel_batches", parallel_batches=num_cluster_cores)` | Ensure that `batch_size` is set to appropriate value (for example: 64) |
-### Saving the trained model
+# Saving the trained model
```python
lenet.fit(X_train, y_train)
@@ -178,7 +176,7 @@ new_lenet.load('trained_weights')
new_lenet.score(X_test, y_test)
```
-### Loading a pretrained caffemodel
+# Loading a pretrained caffemodel
We provide a converter utility to convert `.caffemodel` trained using Caffe to SystemML format.
@@ -210,529 +208,4 @@ vgg.predict(X_test)
# OR Fine-Tuning: vgg.fit(X_train, y_train)
```
-## Frequently asked questions
-
-#### What is the purpose of Caffe2DML API ?
-
-Most deep learning experts are more likely to be familiar with the Caffe's specification
-rather than DML language. For these users, the Caffe2DML API reduces the learning curve to using SystemML.
-Instead of requiring the users to write a DML script for training, fine-tuning and testing the model,
-Caffe2DML takes as an input a network and solver specified in the Caffe specification
-and automatically generates the corresponding DML.
-
-#### With Caffe2DML, does SystemML now require Caffe to be installed ?
-
-Absolutely not. We only support Caffe's API for convenience of the user as stated above.
-Since the Caffe's API is specified in the protobuf format, we are able to generate the java parser files
-and donot require Caffe to be installed. This is also true for Tensorboard feature of Caffe2DML.
-
-```
-Dml.g4 ---> antlr ---> DmlLexer.java, DmlListener.java, DmlParser.java ---> parse foo.dml
-caffe.proto ---> protoc ---> target/generated-sources/caffe/Caffe.java ---> parse caffe_network.proto, caffe_solver.proto
-```
-
-Again, the SystemML engine doesnot invoke (or depend on) Caffe for any of its runtime operators.
-Since the grammar files for the respective APIs (i.e. `caffe.proto`) are used by SystemML,
-we include their licenses in our jar files.
-
-#### How can I speedup the training with Caffe2DML ?
-
-- Enable native BLAS to improve the performance of CP convolution and matrix multiplication operators.
-If you are using OpenBLAS, please ensure that it was built with `USE_OPENMP` flag turned on.
-For more detail see http://apache.github.io/systemml/native-backend
-
-```python
-caffe2dmlObject.setConfigProperty("sysml.native.blas", "auto")
-```
-
-- Turn on the experimental codegen feature. This should help reduce unnecessary allocation cost after every binary operation.
-
-```python
-caffe2dmlObject.setConfigProperty("sysml.codegen.enabled", "true").setConfigProperty("sysml.codegen.plancache", "true")
-```
-
-- Tuned the [Garbage Collector](http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning).
-
-- Enable GPU support (described below).
-
-#### How to enable GPU support in Caffe2DML ?
-
-To be consistent with other mllearn algorithms, we recommend that you use following method instead of setting
-the `solver_mode` in solver file.
-
-```python
-# The below method tells SystemML optimizer to use a GPU-enabled instruction if the operands fit in the GPU memory
-caffe2dmlObject.setGPU(True)
-# The below method tells SystemML optimizer to always use a GPU-enabled instruction irrespective of the memory requirement
-caffe2dmlObject.setForceGPU(True)
-```
-
-#### What is lr_policy in the solver specification ?
-
-The parameter `lr_policy` specifies the learning rate decay policy. Caffe2DML supports following policies:
-- `fixed`: always return `base_lr`.
-- `step`: return `base_lr * gamma ^ (floor(iter / step))`
-- `exp`: return `base_lr * gamma ^ iter`
-- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
-- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
-- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
-
-#### How to set batch size ?
-
-Batch size is set in `data_param` of the Data layer:
-
-```
-layer {
- name: "mnist"
- type: "Data"
- top: "data"
- top: "label"
- data_param {
- source: "mnist_train"
- batch_size: 64
- backend: LMDB
- }
-}
-```
-
-#### How to set maximum number of iterations for training ?
-
-The maximum number of iterations can be set in the solver specification
-
-```bash
-# The maximum number of iterations
-max_iter: 2000
-```
-
-#### How to set the size of the validation dataset ?
-
-The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and
-`test_iter` is 10, then the validation size is 640. This setting generates following DML code internally:
-
-```python
-num_images = nrow(y_full)
-BATCH_SIZE = 64
-num_validation = 10 * BATCH_SIZE
-X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]
-X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
-num_images = nrow(y)
-```
-
-#### How to monitor loss via command-line ?
-
-To monitor loss, please set following parameters in the solver specification
-
-```
-# Display training loss and accuracy every 100 iterations
-display: 100
-# Carry out validation every 500 training iterations and display validation loss and accuracy.
-test_iter: 10
-test_interval: 500
-```
-
-#### How to pass a single jpeg image to Caffe2DML for prediction ?
-
-To convert a jpeg into NumPy matrix, you can use the [pillow package](https://pillow.readthedocs.io/) and
-SystemML's `convertImageToNumPyArr` utility function. The below pyspark code demonstrates the usage:
-
-```python
-from PIL import Image
-import systemml as sml
-from systemml.mllearn import Caffe2DML
-img_shape = (3, 224, 224)
-input_image = sml.convertImageToNumPyArr(Image.open(img_file_path), img_shape=img_shape)
-resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='ResNet_50_pretrained_weights', input_shape=img_shape)
-resnet.predict(input_image)
-```
-
-#### How to prepare a directory of jpeg images for training with Caffe2DML ?
-
-The below pyspark code assumes that the input dataset has 2 labels `cat` and `dogs` and the filename has these labels as prefix.
-We iterate through the directory and convert each jpeg image into pyspark.ml.linalg.Vector using pyspark.
-These vectors are stored as DataFrame and randomized using Spark SQL's `orderBy(rand())` function.
-The DataFrame is then saved in parquet format to reduce the cost of preprocessing for repeated training.
-
-```python
-from systemml.mllearn import Caffe2DML
-from pyspark.sql import SQLContext
-import numpy as np
-import urllib, os, scipy.ndimage
-from pyspark.ml.linalg import Vectors
-from pyspark import StorageLevel
-import systemml as sml
-from pyspark.sql.functions import rand
-# ImageNet specific parameters
-img_shape = (3, 224, 224)
-train_dir = '/home/biuser/dogs_vs_cats/train'
-def getLabelFeatures(filename):
- from PIL import Image
- vec = Vectors.dense(sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])
- if filename.lower().startswith('cat'):
- return (1, vec)
- elif filename.lower().startswith('dog'):
- return (2, vec)
- else:
- raise ValueError('Expected the filename to start with either cat or dog')
-list_jpeg_files = os.listdir(train_dir)
-# 10 files per partition
-train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : getLabelFeatures(filename)).toDF(['label', 'features']).orderBy(rand())
-# Optional: but helps seperates conversion-related from training
-# Alternatively, this dataframe can be passed directly to `caffe2dml_model.fit(train_df)`
-train_df.write.parquet('kaggle-cats-dogs.parquet')
-```
-
-An alternative way to load images into a PySpark DataFrame for prediction, is to use MLLib's LabeledPoint class:
-
-```python
-list_jpeg_files = os.listdir(train_dir)
-train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : LabeledPoint(0, sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])).toDF().select('features')
-# Note: convertVectorColumnsToML has an additional serialization cost
-train_df = MLUtils.convertVectorColumnsToML(train_df)
-```
-
-
-#### Can I use Caffe2DML via Scala ?
-
-Though we recommend using Caffe2DML via its Python interfaces, it is possible to use it by creating an object of the class
-`org.apache.sysml.api.dl.Caffe2DML`. It is important to note that Caffe2DML's scala API is packaged in `systemml-*-extra.jar`.
-
-#### How can I get summary information of my network ?
-
-
-```python
-lenet.summary()
-```
-
-Output:
-
-```
-+-----+---------------+--------------+------------+---------+-----------+---------+
-| Name| Type| Output| Weight| Bias| Top| Bottom|
-+-----+---------------+--------------+------------+---------+-----------+---------+
-|mnist| Data| (, 1, 28, 28)| | |mnist,mnist| |
-|conv1| Convolution|(, 32, 28, 28)| [32 X 25]| [32 X 1]| conv1| mnist|
-|relu1| ReLU|(, 32, 28, 28)| | | relu1| conv1|
-|pool1| Pooling|(, 32, 14, 14)| | | pool1| relu1|
-|conv2| Convolution|(, 64, 14, 14)| [64 X 800]| [64 X 1]| conv2| pool1|
-|relu2| ReLU|(, 64, 14, 14)| | | relu2| conv2|
-|pool2| Pooling| (, 64, 7, 7)| | | pool2| relu2|
-| ip1| InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]| ip1| pool2|
-|relu3| ReLU| (, 512, 1, 1)| | | relu3| ip1|
-|drop1| Dropout| (, 512, 1, 1)| | | drop1| relu3|
-| ip2| InnerProduct| (, 10, 1, 1)| [512 X 10]| [1 X 10]| ip2| drop1|
-| loss|SoftmaxWithLoss| (, 10, 1, 1)| | | loss|ip2,mnist|
-+-----+---------------+--------------+------------+---------+-----------+---------+
-```
-
-#### How can I view the script generated by Caffe2DML ?
-
-To view the generated DML script (and additional debugging information), please set the `debug` parameter to True.
-
-```python
-lenet.set(debug=True)
-```
-
-Output:
-```
-001|debug = TRUE
-002|source("nn/layers/softmax.dml") as softmax
-003|source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss
-004|source("nn/layers/conv2d_builtin.dml") as conv2d_builtin
-005|source("nn/layers/relu.dml") as relu
-006|source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin
-007|source("nn/layers/affine.dml") as affine
-008|source("nn/layers/dropout.dml") as dropout
-009|source("nn/optim/sgd_momentum.dml") as sgd_momentum
-010|source("nn/layers/l2_reg.dml") as l2_reg
-011|X_full_path = ifdef($X, " ")
-012|X_full = read(X_full_path)
-013|y_full_path = ifdef($y, " ")
-014|y_full = read(y_full_path)
-015|num_images = nrow(y_full)
-016|# Convert to one-hot encoding (Assumption: 1-based labels)
-017|y_full = table(seq(1,num_images,1), y_full, num_images, 10)
-018|weights = ifdef($weights, " ")
-019|# Initialize the layers and solvers
-020|X_full = X_full * 0.00390625
-021|BATCH_SIZE = 64
-022|[conv1_weight,conv1_bias] = conv2d_builtin::init(32,1,5,5)
-023|[conv2_weight,conv2_bias] = conv2d_builtin::init(64,32,5,5)
-024|[ip1_weight,ip1_bias] = affine::init(3136,512)
-025|[ip2_weight,ip2_bias] = affine::init(512,10)
-026|conv1_weight_v = sgd_momentum::init(conv1_weight)
-027|conv1_bias_v = sgd_momentum::init(conv1_bias)
-028|conv2_weight_v = sgd_momentum::init(conv2_weight)
-029|conv2_bias_v = sgd_momentum::init(conv2_bias)
-030|ip1_weight_v = sgd_momentum::init(ip1_weight)
-031|ip1_bias_v = sgd_momentum::init(ip1_bias)
-032|ip2_weight_v = sgd_momentum::init(ip2_weight)
-033|ip2_bias_v = sgd_momentum::init(ip2_bias)
-034|num_validation = 10 * BATCH_SIZE
-035|# Sanity check to ensure that validation set is not too large
-036|if(num_validation > ceil(0.3 * num_images)) {
-037| max_test_iter = floor(ceil(0.3 * num_images) / BATCH_SIZE)
-038| stop("Too large validation size. Please reduce test_iter to " + max_test_iter)
-039|}
-040|X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]; X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]; num_images = nrow(y)
-041|num_iters_per_epoch = ceil(num_images / BATCH_SIZE)
-042|max_epochs = ceil(2000 / num_iters_per_epoch)
-043|iter = 0
-044|lr = 0.01
-045|for(e in 1:max_epochs) {
-046| for(i in 1:num_iters_per_epoch) {
-047| beg = ((i-1) * BATCH_SIZE) %% num_images + 1; end = min(beg + BATCH_SIZE - 1, num_images); Xb = X[beg:end,]; yb = y[beg:end,];
-048| iter = iter + 1
-049| # Perform forward pass
-050| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
-051| out4 = relu::forward(out3)
-052| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
-053| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
-054| out7 = relu::forward(out6)
-055| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
-056| out9 = affine::forward(out8,ip1_weight,ip1_bias)
-057| out10 = relu::forward(out9)
-058| [out11,mask11] = dropout::forward(out10,0.5,-1)
-059| out12 = affine::forward(out11,ip2_weight,ip2_bias)
-060| out13 = softmax::forward(out12)
-061| # Perform backward pass
-062| dProbs = cross_entropy_loss::backward(out13,yb); dOut13 = softmax::backward(dProbs,out12); dOut13_12 = dOut13; dOut13_2 = dOut13;
-063| [dOut12,ip2_dWeight,ip2_dBias] = affine::backward(dOut13_12,out11,ip2_weight,ip2_bias); dOut12_11 = dOut12;
-064| dOut11 = dropout::backward(dOut12_11,out10,0.5,mask11); dOut11_10 = dOut11;
-065| dOut10 = relu::backward(dOut11_10,out9); dOut10_9 = dOut10;
-066| [dOut9,ip1_dWeight,ip1_dBias] = affine::backward(dOut10_9,out8,ip1_weight,ip1_bias); dOut9_8 = dOut9;
-067| dOut8 = max_pool2d_builtin::backward(dOut9_8,7,7,out7,64,14,14,2,2,2,2,0,0); dOut8_7 = dOut8;
-068| dOut7 = relu::backward(dOut8_7,out6); dOut7_6 = dOut7;
-069| [dOut6,conv2_dWeight,conv2_dBias] = conv2d_builtin::backward(dOut7_6,14,14,out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2); dOut6_5 = dOut6;
-070| dOut5 = max_pool2d_builtin::backward(dOut6_5,14,14,out4,32,28,28,2,2,2,2,0,0); dOut5_4 = dOut5;
-071| dOut4 = relu::backward(dOut5_4,out3); dOut4_3 = dOut4;
-072| [dOut3,conv1_dWeight,conv1_dBias] = conv2d_builtin::backward(dOut4_3,28,28,Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2); dOut3_2 = dOut3;
-073| # Update the parameters
-074| conv1_dWeight_reg = l2_reg::backward(conv1_weight, 5.000000237487257E-4)
-075| conv1_dWeight = conv1_dWeight + conv1_dWeight_reg
-076| [conv1_weight,conv1_weight_v] = sgd_momentum::update(conv1_weight,conv1_dWeight,(lr * 1.0),0.8999999761581421,conv1_weight_v)
-077| [conv1_bias,conv1_bias_v] = sgd_momentum::update(conv1_bias,conv1_dBias,(lr * 2.0),0.8999999761581421,conv1_bias_v)
-078| conv2_dWeight_reg = l2_reg::backward(conv2_weight, 5.000000237487257E-4)
-079| conv2_dWeight = conv2_dWeight + conv2_dWeight_reg
-080| [conv2_weight,conv2_weight_v] = sgd_momentum::update(conv2_weight,conv2_dWeight,(lr * 1.0),0.8999999761581421,conv2_weight_v)
-081| [conv2_bias,conv2_bias_v] = sgd_momentum::update(conv2_bias,conv2_dBias,(lr * 2.0),0.8999999761581421,conv2_bias_v)
-082| ip1_dWeight_reg = l2_reg::backward(ip1_weight, 5.000000237487257E-4)
-083| ip1_dWeight = ip1_dWeight + ip1_dWeight_reg
-084| [ip1_weight,ip1_weight_v] = sgd_momentum::update(ip1_weight,ip1_dWeight,(lr * 1.0),0.8999999761581421,ip1_weight_v)
-085| [ip1_bias,ip1_bias_v] = sgd_momentum::update(ip1_bias,ip1_dBias,(lr * 2.0),0.8999999761581421,ip1_bias_v)
-086| ip2_dWeight_reg = l2_reg::backward(ip2_weight, 5.000000237487257E-4)
-087| ip2_dWeight = ip2_dWeight + ip2_dWeight_reg
-088| [ip2_weight,ip2_weight_v] = sgd_momentum::update(ip2_weight,ip2_dWeight,(lr * 1.0),0.8999999761581421,ip2_weight_v)
-089| [ip2_bias,ip2_bias_v] = sgd_momentum::update(ip2_bias,ip2_dBias,(lr * 2.0),0.8999999761581421,ip2_bias_v)
-090| # Compute training loss & accuracy
-091| if(iter %% 100 == 0) {
-092| loss = 0
-093| accuracy = 0
-094| tmp_loss = cross_entropy_loss::forward(out13,yb)
-095| loss = loss + tmp_loss
-096| true_yb = rowIndexMax(yb)
-097| predicted_yb = rowIndexMax(out13)
-098| accuracy = mean(predicted_yb == true_yb)*100
-099| training_loss = loss
-100| training_accuracy = accuracy
-101| print("Iter:" + iter + ", training loss:" + training_loss + ", training accuracy:" + training_accuracy)
-102| if(debug) {
-103| num_rows_error_measures = min(10, ncol(yb))
-104| error_measures = matrix(0, rows=num_rows_error_measures, cols=5)
-105| for(class_i in 1:num_rows_error_measures) {
-106| tp = sum( (true_yb == predicted_yb) * (true_yb == class_i) )
-107| tp_plus_fp = sum( (predicted_yb == class_i) )
-108| tp_plus_fn = sum( (true_yb == class_i) )
-109| precision = tp / tp_plus_fp
-110| recall = tp / tp_plus_fn
-111| f1Score = 2*precision*recall / (precision+recall)
-112| error_measures[class_i,1] = class_i
-113| error_measures[class_i,2] = precision
-114| error_measures[class_i,3] = recall
-115| error_measures[class_i,4] = f1Score
-116| error_measures[class_i,5] = tp_plus_fn
-117| }
-118| print("class \tprecision\trecall \tf1-score\tnum_true_labels\n" + toString(error_measures, decimal=7, sep="\t"))
-119| }
-120| }
-121| # Compute validation loss & accuracy
-122| if(iter %% 500 == 0) {
-123| loss = 0
-124| accuracy = 0
-125| validation_loss = 0
-126| validation_accuracy = 0
-127| for(iVal in 1:num_iters_per_epoch) {
-128| beg = ((iVal-1) * BATCH_SIZE) %% num_validation + 1; end = min(beg + BATCH_SIZE - 1, num_validation); Xb = X_val[beg:end,]; yb = y_val[beg:end,];
-129| # Perform forward pass
-130| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
-131| out4 = relu::forward(out3)
-132| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
-133| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
-134| out7 = relu::forward(out6)
-135| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
-136| out9 = affine::forward(out8,ip1_weight,ip1_bias)
-137| out10 = relu::forward(out9)
-138| [out11,mask11] = dropout::forward(out10,0.5,-1)
-139| out12 = affine::forward(out11,ip2_weight,ip2_bias)
-140| out13 = softmax::forward(out12)
-141| tmp_loss = cross_entropy_loss::forward(out13,yb)
-142| loss = loss + tmp_loss
-143| true_yb = rowIndexMax(yb)
-144| predicted_yb = rowIndexMax(out13)
-145| accuracy = mean(predicted_yb == true_yb)*100
-146| validation_loss = validation_loss + loss
-147| validation_accuracy = validation_accuracy + accuracy
-148| }
-149| validation_accuracy = validation_accuracy / num_iters_per_epoch
-150| print("Iter:" + iter + ", validation loss:" + validation_loss + ", validation accuracy:" + validation_accuracy)
-151| }
-152| }
-153| # Learning rate
-154| lr = (0.009999999776482582 * 0.949999988079071^e)
-155|}
-
-Iter:100, training loss:0.24014199350958168, training accuracy:87.5
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-3.0000000 0.8888889 0.8888889 0.8888889 9.0000000
-4.0000000 0.7500000 0.7500000 0.7500000 4.0000000
-5.0000000 0.7500000 1.0000000 0.8571429 3.0000000
-6.0000000 0.8333333 1.0000000 0.9090909 5.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-8.0000000 0.8571429 0.7500000 0.8000000 8.0000000
-9.0000000 1.0000000 0.5714286 0.7272727 7.0000000
-10.0000000 0.7272727 0.8888889 0.8000000 9.0000000
-
-Iter:200, training loss:0.09555593867171894, training accuracy:98.4375
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-7.0000000 1.0000000 0.6666667 0.8000000 3.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-9.0000000 0.8571429 1.0000000 0.9230769 6.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-
-Iter:300, training loss:0.058686794512570216, training accuracy:98.4375
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-6.0000000 1.0000000 0.8750000 0.9333333 8.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 2.0000000
-9.0000000 0.8888889 1.0000000 0.9411765 8.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-
-Iter:400, training loss:0.08742103541529415, training accuracy:96.875
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-2.0000000 0.8000000 1.0000000 0.8888889 8.0000000
-3.0000000 1.0000000 0.8333333 0.9090909 6.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-10.0000000 1.0000000 0.9230769 0.9600000 13.0000000
-
-Iter:500, training loss:0.05873836245880005, training accuracy:98.4375
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-6.0000000 1.0000000 0.8571429 0.9230769 7.0000000
-7.0000000 0.8571429 1.0000000 0.9230769 6.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-
-Iter:500, validation loss:260.1580978627665, validation accuracy:96.43954918032787
-Iter:600, training loss:0.07584116043829209, training accuracy:98.4375
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-8.0000000 1.0000000 0.9230769 0.9600000 13.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-10.0000000 0.8333333 1.0000000 0.9090909 5.0000000
-
-Iter:700, training loss:0.07973166944626336, training accuracy:98.4375
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-8.0000000 0.8000000 1.0000000 0.8888889 4.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-10.0000000 1.0000000 0.9166667 0.9565217 12.0000000
-
-Iter:800, training loss:0.0063778595034221855, training accuracy:100.0
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 2.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-
-Iter:900, training loss:0.019673112167879484, training accuracy:100.0
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 12.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-
-Iter:1000, training loss:0.06137978002508307, training accuracy:96.875
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 8.0000000
-4.0000000 0.8333333 0.8333333 0.8333333 6.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 3.0000000
-8.0000000 0.8888889 0.8888889 0.8888889 9.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-
-Iter:1000, validation loss:238.62301345198944, validation accuracy:97.02868852459017
-Iter:1100, training loss:0.023325103696013115, training accuracy:100.0
-class precision recall f1-score num_true_labels
-1.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-2.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
-5.0000000 1.0000000 1.0000000 1.0000000 2.0000000
-6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
-7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
-8.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-9.0000000 1.0000000 1.0000000 1.0000000 9.0000000
-10.0000000 1.0000000 1.0000000 1.0000000 6.0000000
-...
-```
+Please see [Caffe2DML's reference guide](http://apache.github.io/systemml/reference-guide-caffe2dml) for more details.
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index d1dded7..1178009 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -50,8 +50,9 @@ for running SystemML from Spark via Scala, Python, or Java.
* [Standalone](standalone-guide) - Standalone mode allows data scientists to rapidly prototype algorithms on a single
machine in R-like and Python-like declarative languages.
* [JMLC](jmlc) - Java Machine Learning Connector.
-* *Experimental* [Caffe2DML API](beginners-guide-caffe2dml) for Deep Learning.
+* *Experimental* Caffe2DML API for Deep Learning ([beginner's guide](beginners-guide-caffe2dml), [reference guide](reference-guide-caffe2dml)) - Converts a Caffe specification to DML.
* *Experimental* [Keras2DML API](beginners-guide-keras2dml) for Deep Learning.
+
## Language Guides
* [Python API Reference](python-reference) - API Reference Guide for Python users.
@@ -79,3 +80,4 @@ command-line interface.
* [Engine Developer Guide](engine-dev-guide) - Guide for internal SystemML engine development.
* [Troubleshooting Guide](troubleshooting-guide) - Troubleshoot various issues related to SystemML.
* [Release Process](release-process) - Description of the SystemML release process.
+* [Using Native BLAS](native-backend) in SystemML.
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/docs/reference-guide-caffe2dml.md
----------------------------------------------------------------------
diff --git a/docs/reference-guide-caffe2dml.md b/docs/reference-guide-caffe2dml.md
new file mode 100644
index 0000000..24d5753
--- /dev/null
+++ b/docs/reference-guide-caffe2dml.md
@@ -0,0 +1,986 @@
+---
+layout: global
+title: Beginner's Guide for Caffe2DML users
+description: Beginner's Guide for Caffe2DML users
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+<br/>
+
+
+# Layers supported in Caffe2DML
+
+Caffe2DML to be as compatible with [the Caffe specification](http://caffe.berkeleyvision.org/tutorial/layers.html) as possible.
+The main differences are given below along with the usage guide that mirrors the Caffe specification.
+
+## Vision Layers
+
+### Convolution Layer
+
+Invokes [nn/layers/conv2d_builtin.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_builtin.dml)
+or [nn/layers/conv2d_depthwise.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_depthwise.dml) layer.
+
+**Required Parameters:**
+
+- num_output: the number of filters
+- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
+
+**Optional Parameters:**
+
+- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs
+- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input
+- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input
+- group (g) (default 1): If g > 1, we restrict the connectivity of each filter to a subset of the input.
+Specifically, the input and output channels are separated into g groups,
+and the ith output group channels will be only connected to the ith input group channels.
+Note: we only support depthwise convolution, hence `g` should be divisible by number of channels
+
+**Parameters that are ignored:**
+
+- weight_filler: We use the heuristic by He et al., which limits the magnification of inputs/gradients
+during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n),
+under the assumption of relu neurons.
+- bias_filler: We use `constant bias_filler` with `value:0`
+
+**Sample Usage:**
+```
+layer {
+ name: "conv1"
+ type: "Convolution"
+ bottom: "data"
+ top: "conv1"
+ # learning rate and decay multipliers for the filters
+ param { lr_mult: 1 decay_mult: 1 }
+ # learning rate and decay multipliers for the biases
+ param { lr_mult: 2 decay_mult: 0 }
+ convolution_param {
+ num_output: 96 # learn 96 filters
+ kernel_size: 11 # each filter is 11x11
+ stride: 4 # step 4 pixels between each filter application
+ weight_filler {
+ type: "xavier" # initialize the filters from a Gaussian
+ }
+ bias_filler {
+ type: "constant" # initialize the biases to zero (0)
+ value: 0
+ }
+ }
+ }
+ ```
+
+### Pooling Layer
+
+Invokes [nn/layers/max_pool2d_builtin.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/max_pool2d_builtin.dml) layer.
+
+**Required Parameters:**
+
+- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
+
+**Optional Parameters:**
+- pool (default MAX): the pooling method. Currently, we only support MAX, not AVE, or STOCHASTIC.
+- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input
+- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input
+
+**Sample Usage:**
+```
+layer {
+ name: "pool1"
+ type: "Pooling"
+ bottom: "conv1"
+ top: "pool1"
+ pooling_param {
+ pool: MAX
+ kernel_size: 3 # pool over a 3x3 region
+ stride: 2 # step two pixels (in the bottom blob) between pooling regions
+ }
+}
+```
+
+### Deconvolution Layer
+
+Invokes [nn/layers/conv2d_transpose.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_transpose.dml)
+or [nn/layers/conv2d_transpose_depthwise.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/conv2d_transpose_depthwise.dml) layer.
+
+**Required Parameters:**
+
+- num_output: the number of filters
+- kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
+
+**Optional Parameters:**
+
+- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs
+- pad (or pad_h and pad_w) (default 0): specifies the number of pixels to (implicitly) add to each side of the input
+- stride (or stride_h and stride_w) (default 1): specifies the intervals at which to apply the filters to the input
+- group (g) (default 1): If g > 1, we restrict the connectivity of each filter to a subset of the input.
+Specifically, the input and output channels are separated into g groups,
+and the ith output group channels will be only connected to the ith input group channels.
+Note: we only support depthwise convolution, hence `g` should be divisible by number of channels
+
+**Parameters that are ignored:**
+
+- weight_filler: We use the heuristic by He et al., which limits the magnification of inputs/gradients
+during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n),
+under the assumption of relu neurons.
+- bias_filler: We use `constant bias_filler` with `value:0`
+
+**Sample Usage:**
+```
+layer {
+ name: "upconv_d5c_u4a"
+ type: "Deconvolution"
+ bottom: "u5d"
+ top: "u4a"
+ param {
+ lr_mult: 0.0
+ decay_mult: 0.0
+ }
+ convolution_param {
+ num_output: 190
+ bias_term: false
+ pad: 1
+ kernel_size: 4
+ group: 190
+ stride: 2
+ weight_filler {
+ type: "bilinear"
+ }
+ }
+}
+```
+
+
+## Common Layers
+
+### Inner Product / Fully Connected Layer
+
+Invokes [nn/layers/affine.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/affine.dml) layer.
+
+**Required Parameters:**
+
+- num_output: the number of filters
+
+**Parameters that are ignored:**
+- weight_filler (default type: 'constant' value: 0): We use the heuristic by He et al., which limits the magnification
+of inputs/gradients during forward/backward passes by scaling unit-Gaussian weights by a factor of sqrt(2/n), under the
+assumption of relu neurons.
+- bias_filler (default type: 'constant' value: 0): We use the default type and value.
+- bias_term (default true): specifies whether to learn and apply a set of additive biases to the filter outputs. We use `bias_term=true`.
+
+**Sample Usage:**
+```
+layer {
+ name: "fc8"
+ type: "InnerProduct"
+ # learning rate and decay multipliers for the weights
+ param { lr_mult: 1 decay_mult: 1 }
+ # learning rate and decay multipliers for the biases
+ param { lr_mult: 2 decay_mult: 0 }
+ inner_product_param {
+ num_output: 1000
+ weight_filler {
+ type: "xavier"
+ }
+ bias_filler {
+ type: "constant"
+ value: 0
+ }
+ }
+ bottom: "fc7"
+ top: "fc8"
+}
+```
+
+### Dropout Layer
+
+Invokes [nn/layers/dropout.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/dropout.dml) layer.
+
+**Optional Parameters:**
+
+- dropout_ratio(default = 0.5): dropout ratio
+
+**Sample Usage:**
+```
+layer {
+ name: "drop1"
+ type: "Dropout"
+ bottom: "relu3"
+ top: "drop1"
+ dropout_param {
+ dropout_ratio: 0.5
+ }
+}
+```
+
+## Normalization Layers
+
+### BatchNorm Layer
+
+This is used in combination with Scale layer.
+
+Invokes [nn/layers/batch_norm2d.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/batch_norm2d.dml) layer.
+
+**Optional Parameters:**
+- moving_average_fraction (default = .999): Momentum value for moving averages. Typical values are in the range of [0.9, 0.999].
+- eps (default = 1e-5): Smoothing term to avoid divide by zero errors. Typical values are in the range of [1e-5, 1e-3].
+
+**Parameters that are ignored:**
+- use_global_stats: If false, normalization is performed over the current mini-batch
+and global statistics are accumulated (but not yet used) by a moving average.
+If true, those accumulated mean and variance values are used for the normalization.
+By default, it is set to false when the network is in the training phase and true when the network is in the testing phase.
+
+**Sample Usage:**
+```
+layer {
+ bottom: "conv1"
+ top: "conv1"
+ name: "bn_conv1"
+ type: "BatchNorm"
+ batch_norm_param {
+ use_global_stats: true
+ }
+}
+layer {
+ bottom: "conv1"
+ top: "conv1"
+ name: "scale_conv1"
+ type: "Scale"
+ scale_param {
+ bias_term: true
+ }
+}
+```
+
+## Activation / Neuron Layers
+
+In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size.
+In the layers below, we will ignore the input and out sizes as they are identical.
+
+### ReLU / Rectified-Linear Layer
+
+Invokes [nn/layers/relu.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/relu.dml) layer.
+
+**Parameters that are ignored:**
+- negative_slope (default 0): specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
+
+**Sample Usage:**
+```
+layer {
+ name: "relu1"
+ type: "ReLU"
+ bottom: "conv1"
+ top: "conv1"
+}
+```
+
+### TanH Layer
+
+Invokes [nn/layers/tanh.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/tanh.dml) layer.
+
+**Sample Usage:**
+```
+layer {
+ name: "tanh1"
+ type: "TanH"
+ bottom: "conv1"
+ top: "conv1"
+}
+```
+
+### Sigmoid Layer
+
+Invokes [nn/layers/sigmoid.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/sigmoid.dml) layer.
+
+**Sample Usage:**
+```
+layer {
+ name: "sigmoid1"
+ type: "Sigmoid"
+ bottom: "conv1"
+ top: "conv1"
+}
+```
+
+
+### Threshold Layer
+
+Computes `X > threshold`
+
+**Parameters that are ignored:**
+- threshold (default: 0):Strictly positive values
+
+**Sample Usage:**
+```
+layer {
+ name: "threshold1"
+ type: "Threshold"
+ bottom: "conv1"
+ top: "conv1"
+}
+```
+
+## Utility Layers
+
+### Eltwise Layer
+
+Element-wise operations such as product or sum between two blobs.
+
+**Parameters that are ignored:**
+- operation(default: SUM): element-wise operation. only SUM supported for now.
+- table_prod_grad(default: true): Whether to use an asymptotically slower (for >2 inputs) but stabler method
+of computing the gradient for the PROD operation. (No effect for SUM op.)
+
+**Sample Usage:**
+```
+layer {
+ bottom: "res2a_branch1"
+ bottom: "res2a_branch2c"
+ top: "res2a"
+ name: "res2a"
+ type: "Eltwise"
+}
+```
+
+### Concat Layer
+
+**Inputs:**
+- `n_i * c_i * h * w` for each input blob i from 1 to K.
+
+**Outputs:**
+- out: Outputs, of shape
+ - if axis = 0: `(n_1 + n_2 + ... + n_K) * c_1 * h * w`, and all input `c_i` should be the same.
+ - if axis = 1: `n_1 * (c_1 + c_2 + ... + c_K) * h * w`, and all input `n_i` should be the same.
+
+**Optional Parameters:**
+- axis (default: 1): The axis along which to concatenate.
+
+**Sample Usage:**
+```
+layer {
+ name: "concat_d5cc_u5a-b"
+ type: "Concat"
+ bottom: "u5a"
+ bottom: "d5c"
+ top: "u5b"
+}
+```
+
+### Softmax Layer
+
+Invokes [nn/layers/softmax.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax.dml) layer.
+
+Computes the forward pass for a softmax classifier. The inputs
+are interpreted as unnormalized, log-probabilities for each of
+N examples, and the softmax function transforms them to normalized
+probabilities.
+
+This can be interpreted as a generalization of the sigmoid
+function to multiple classes.
+
+`probs_ij = e^scores_ij / sum(e^scores_i)`
+
+**Parameters that are ignored:**
+- axis (default: 1): The axis along which to perform the softmax.
+
+**Sample Usage:**
+```
+layer {
+ name: "sm"
+ type: "Softmax"
+ bottom: "score"
+ top: "sm"
+}
+```
+
+## Loss Layers
+
+Loss drives learning by comparing an output to a target and assigning cost to minimize.
+The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass.
+
+### Softmax with Loss Layer
+
+The softmax loss layer computes the multinomial logistic loss of the softmax of its inputs.
+It’s conceptually identical to a softmax layer followed by a multinomial logistic loss layer, but provides a more numerically stable gradient.
+
+Invokes [nn/layers/softmax.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax.dml)
+and [nn/layers/cross_entropy_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/cross_entropy_loss.dml)
+for classification problems.
+
+For image segmentation problems, invokes [nn/layers/softmax2d_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/softmax2d_loss.dml) layer.
+
+**Sample Usage:**
+```
+layer {
+ name: "loss"
+ type: "SoftmaxWithLoss"
+ bottom: "ip2"
+ bottom: "label"
+ top: "loss"
+}
+```
+
+### Euclidean layer
+
+The Euclidean loss layer computes the sum of squares of differences of its two inputs.
+
+Invokes [nn/layers/l2_loss.dml](https://github.com/apache/systemml/blob/master/scripts/nn/layers/l2_loss.dml) layer.
+
+**Sample Usage:**
+```
+layer {
+ name: "loss"
+ type: "EuclideanLoss"
+ bottom: "ip2"
+ bottom: "label"
+ top: "loss"
+}
+```
+
+
+# Frequently asked questions
+
+#### What is the purpose of Caffe2DML API ?
+
+Most deep learning experts are more likely to be familiar with the Caffe's specification
+rather than DML language. For these users, the Caffe2DML API reduces the learning curve to using SystemML.
+Instead of requiring the users to write a DML script for training, fine-tuning and testing the model,
+Caffe2DML takes as an input a network and solver specified in the Caffe specification
+and automatically generates the corresponding DML.
+
+#### With Caffe2DML, does SystemML now require Caffe to be installed ?
+
+Absolutely not. We only support Caffe's API for convenience of the user as stated above.
+Since the Caffe's API is specified in the protobuf format, we are able to generate the java parser files
+and donot require Caffe to be installed. This is also true for Tensorboard feature of Caffe2DML.
+
+```
+Dml.g4 ---> antlr ---> DmlLexer.java, DmlListener.java, DmlParser.java ---> parse foo.dml
+caffe.proto ---> protoc ---> target/generated-sources/caffe/Caffe.java ---> parse caffe_network.proto, caffe_solver.proto
+```
+
+Again, the SystemML engine doesnot invoke (or depend on) Caffe for any of its runtime operators.
+Since the grammar files for the respective APIs (i.e. `caffe.proto`) are used by SystemML,
+we include their licenses in our jar files.
+
+#### How can I speedup the training with Caffe2DML ?
+
+- Enable native BLAS to improve the performance of CP convolution and matrix multiplication operators.
+If you are using OpenBLAS, please ensure that it was built with `USE_OPENMP` flag turned on.
+For more detail see http://apache.github.io/systemml/native-backend
+
+```python
+caffe2dmlObject.setConfigProperty("sysml.native.blas", "auto")
+```
+
+- Turn on the experimental codegen feature. This should help reduce unnecessary allocation cost after every binary operation.
+
+```python
+caffe2dmlObject.setConfigProperty("sysml.codegen.enabled", "true").setConfigProperty("sysml.codegen.plancache", "true")
+```
+
+- Tuned the [Garbage Collector](http://spark.apache.org/docs/latest/tuning.html#garbage-collection-tuning).
+
+- Enable GPU support (described below).
+
+#### How to enable GPU support in Caffe2DML ?
+
+To be consistent with other mllearn algorithms, we recommend that you use following method instead of setting
+the `solver_mode` in solver file.
+
+```python
+# The below method tells SystemML optimizer to use a GPU-enabled instruction if the operands fit in the GPU memory
+caffe2dmlObject.setGPU(True)
+# The below method tells SystemML optimizer to always use a GPU-enabled instruction irrespective of the memory requirement
+caffe2dmlObject.setForceGPU(True)
+```
+
+#### What is lr_policy in the solver specification ?
+
+The parameter `lr_policy` specifies the learning rate decay policy. Caffe2DML supports following policies:
+- `fixed`: always return `base_lr`.
+- `step`: return `base_lr * gamma ^ (floor(iter / step))`
+- `exp`: return `base_lr * gamma ^ iter`
+- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
+- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
+- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
+
+#### How to set batch size ?
+
+Batch size is set in `data_param` of the Data layer:
+
+```
+layer {
+ name: "mnist"
+ type: "Data"
+ top: "data"
+ top: "label"
+ data_param {
+ source: "mnist_train"
+ batch_size: 64
+ backend: LMDB
+ }
+}
+```
+
+#### How to set maximum number of iterations for training ?
+
+The maximum number of iterations can be set in the solver specification
+
+```bash
+# The maximum number of iterations
+max_iter: 2000
+```
+
+#### How to set the size of the validation dataset ?
+
+The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and
+`test_iter` is 10, then the validation size is 640. This setting generates following DML code internally:
+
+```python
+num_images = nrow(y_full)
+BATCH_SIZE = 64
+num_validation = 10 * BATCH_SIZE
+X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]
+X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
+num_images = nrow(y)
+```
+
+#### How to monitor loss via command-line ?
+
+To monitor loss, please set following parameters in the solver specification
+
+```
+# Display training loss and accuracy every 100 iterations
+display: 100
+# Carry out validation every 500 training iterations and display validation loss and accuracy.
+test_iter: 10
+test_interval: 500
+```
+
+#### How to pass a single jpeg image to Caffe2DML for prediction ?
+
+To convert a jpeg into NumPy matrix, you can use the [pillow package](https://pillow.readthedocs.io/) and
+SystemML's `convertImageToNumPyArr` utility function. The below pyspark code demonstrates the usage:
+
+```python
+from PIL import Image
+import systemml as sml
+from systemml.mllearn import Caffe2DML
+img_shape = (3, 224, 224)
+input_image = sml.convertImageToNumPyArr(Image.open(img_file_path), img_shape=img_shape)
+resnet = Caffe2DML(sqlCtx, solver='ResNet_50_solver.proto', weights='ResNet_50_pretrained_weights', input_shape=img_shape)
+resnet.predict(input_image)
+```
+
+#### How to prepare a directory of jpeg images for training with Caffe2DML ?
+
+The below pyspark code assumes that the input dataset has 2 labels `cat` and `dogs` and the filename has these labels as prefix.
+We iterate through the directory and convert each jpeg image into pyspark.ml.linalg.Vector using pyspark.
+These vectors are stored as DataFrame and randomized using Spark SQL's `orderBy(rand())` function.
+The DataFrame is then saved in parquet format to reduce the cost of preprocessing for repeated training.
+
+```python
+from systemml.mllearn import Caffe2DML
+from pyspark.sql import SQLContext
+import numpy as np
+import urllib, os, scipy.ndimage
+from pyspark.ml.linalg import Vectors
+from pyspark import StorageLevel
+import systemml as sml
+from pyspark.sql.functions import rand
+# ImageNet specific parameters
+img_shape = (3, 224, 224)
+train_dir = '/home/biuser/dogs_vs_cats/train'
+def getLabelFeatures(filename):
+ from PIL import Image
+ vec = Vectors.dense(sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])
+ if filename.lower().startswith('cat'):
+ return (1, vec)
+ elif filename.lower().startswith('dog'):
+ return (2, vec)
+ else:
+ raise ValueError('Expected the filename to start with either cat or dog')
+list_jpeg_files = os.listdir(train_dir)
+# 10 files per partition
+train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : getLabelFeatures(filename)).toDF(['label', 'features']).orderBy(rand())
+# Optional: but helps seperates conversion-related from training
+# Alternatively, this dataframe can be passed directly to `caffe2dml_model.fit(train_df)`
+train_df.write.parquet('kaggle-cats-dogs.parquet')
+```
+
+An alternative way to load images into a PySpark DataFrame for prediction, is to use MLLib's LabeledPoint class:
+
+```python
+list_jpeg_files = os.listdir(train_dir)
+train_df = sc.parallelize(list_jpeg_files, int(len(list_jpeg_files)/10)).map(lambda filename : LabeledPoint(0, sml.convertImageToNumPyArr(Image.open(os.path.join(train_dir, filename)), img_shape=img_shape)[0,:])).toDF().select('features')
+# Note: convertVectorColumnsToML has an additional serialization cost
+train_df = MLUtils.convertVectorColumnsToML(train_df)
+```
+
+
+#### Can I use Caffe2DML via Scala ?
+
+Though we recommend using Caffe2DML via its Python interfaces, it is possible to use it by creating an object of the class
+`org.apache.sysml.api.dl.Caffe2DML`. It is important to note that Caffe2DML's scala API is packaged in `systemml-*-extra.jar`.
+
+#### How can I get summary information of my network ?
+
+
+```python
+lenet.summary()
+```
+
+Output:
+
+```
++-----+---------------+--------------+------------+---------+-----------+---------+
+| Name| Type| Output| Weight| Bias| Top| Bottom|
++-----+---------------+--------------+------------+---------+-----------+---------+
+|mnist| Data| (, 1, 28, 28)| | |mnist,mnist| |
+|conv1| Convolution|(, 32, 28, 28)| [32 X 25]| [32 X 1]| conv1| mnist|
+|relu1| ReLU|(, 32, 28, 28)| | | relu1| conv1|
+|pool1| Pooling|(, 32, 14, 14)| | | pool1| relu1|
+|conv2| Convolution|(, 64, 14, 14)| [64 X 800]| [64 X 1]| conv2| pool1|
+|relu2| ReLU|(, 64, 14, 14)| | | relu2| conv2|
+|pool2| Pooling| (, 64, 7, 7)| | | pool2| relu2|
+| ip1| InnerProduct| (, 512, 1, 1)|[3136 X 512]|[1 X 512]| ip1| pool2|
+|relu3| ReLU| (, 512, 1, 1)| | | relu3| ip1|
+|drop1| Dropout| (, 512, 1, 1)| | | drop1| relu3|
+| ip2| InnerProduct| (, 10, 1, 1)| [512 X 10]| [1 X 10]| ip2| drop1|
+| loss|SoftmaxWithLoss| (, 10, 1, 1)| | | loss|ip2,mnist|
++-----+---------------+--------------+------------+---------+-----------+---------+
+```
+
+#### How can I view the script generated by Caffe2DML ?
+
+To view the generated DML script (and additional debugging information), please set the `debug` parameter to True.
+
+```python
+lenet.set(debug=True)
+```
+
+Output:
+```
+001|debug = TRUE
+002|source("nn/layers/softmax.dml") as softmax
+003|source("nn/layers/cross_entropy_loss.dml") as cross_entropy_loss
+004|source("nn/layers/conv2d_builtin.dml") as conv2d_builtin
+005|source("nn/layers/relu.dml") as relu
+006|source("nn/layers/max_pool2d_builtin.dml") as max_pool2d_builtin
+007|source("nn/layers/affine.dml") as affine
+008|source("nn/layers/dropout.dml") as dropout
+009|source("nn/optim/sgd_momentum.dml") as sgd_momentum
+010|source("nn/layers/l2_reg.dml") as l2_reg
+011|X_full_path = ifdef($X, " ")
+012|X_full = read(X_full_path)
+013|y_full_path = ifdef($y, " ")
+014|y_full = read(y_full_path)
+015|num_images = nrow(y_full)
+016|# Convert to one-hot encoding (Assumption: 1-based labels)
+017|y_full = table(seq(1,num_images,1), y_full, num_images, 10)
+018|weights = ifdef($weights, " ")
+019|# Initialize the layers and solvers
+020|X_full = X_full * 0.00390625
+021|BATCH_SIZE = 64
+022|[conv1_weight,conv1_bias] = conv2d_builtin::init(32,1,5,5)
+023|[conv2_weight,conv2_bias] = conv2d_builtin::init(64,32,5,5)
+024|[ip1_weight,ip1_bias] = affine::init(3136,512)
+025|[ip2_weight,ip2_bias] = affine::init(512,10)
+026|conv1_weight_v = sgd_momentum::init(conv1_weight)
+027|conv1_bias_v = sgd_momentum::init(conv1_bias)
+028|conv2_weight_v = sgd_momentum::init(conv2_weight)
+029|conv2_bias_v = sgd_momentum::init(conv2_bias)
+030|ip1_weight_v = sgd_momentum::init(ip1_weight)
+031|ip1_bias_v = sgd_momentum::init(ip1_bias)
+032|ip2_weight_v = sgd_momentum::init(ip2_weight)
+033|ip2_bias_v = sgd_momentum::init(ip2_bias)
+034|num_validation = 10 * BATCH_SIZE
+035|# Sanity check to ensure that validation set is not too large
+036|if(num_validation > ceil(0.3 * num_images)) {
+037| max_test_iter = floor(ceil(0.3 * num_images) / BATCH_SIZE)
+038| stop("Too large validation size. Please reduce test_iter to " + max_test_iter)
+039|}
+040|X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]; X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]; num_images = nrow(y)
+041|num_iters_per_epoch = ceil(num_images / BATCH_SIZE)
+042|max_epochs = ceil(2000 / num_iters_per_epoch)
+043|iter = 0
+044|lr = 0.01
+045|for(e in 1:max_epochs) {
+046| for(i in 1:num_iters_per_epoch) {
+047| beg = ((i-1) * BATCH_SIZE) %% num_images + 1; end = min(beg + BATCH_SIZE - 1, num_images); Xb = X[beg:end,]; yb = y[beg:end,];
+048| iter = iter + 1
+049| # Perform forward pass
+050| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
+051| out4 = relu::forward(out3)
+052| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
+053| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
+054| out7 = relu::forward(out6)
+055| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
+056| out9 = affine::forward(out8,ip1_weight,ip1_bias)
+057| out10 = relu::forward(out9)
+058| [out11,mask11] = dropout::forward(out10,0.5,-1)
+059| out12 = affine::forward(out11,ip2_weight,ip2_bias)
+060| out13 = softmax::forward(out12)
+061| # Perform backward pass
+062| dProbs = cross_entropy_loss::backward(out13,yb); dOut13 = softmax::backward(dProbs,out12); dOut13_12 = dOut13; dOut13_2 = dOut13;
+063| [dOut12,ip2_dWeight,ip2_dBias] = affine::backward(dOut13_12,out11,ip2_weight,ip2_bias); dOut12_11 = dOut12;
+064| dOut11 = dropout::backward(dOut12_11,out10,0.5,mask11); dOut11_10 = dOut11;
+065| dOut10 = relu::backward(dOut11_10,out9); dOut10_9 = dOut10;
+066| [dOut9,ip1_dWeight,ip1_dBias] = affine::backward(dOut10_9,out8,ip1_weight,ip1_bias); dOut9_8 = dOut9;
+067| dOut8 = max_pool2d_builtin::backward(dOut9_8,7,7,out7,64,14,14,2,2,2,2,0,0); dOut8_7 = dOut8;
+068| dOut7 = relu::backward(dOut8_7,out6); dOut7_6 = dOut7;
+069| [dOut6,conv2_dWeight,conv2_dBias] = conv2d_builtin::backward(dOut7_6,14,14,out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2); dOut6_5 = dOut6;
+070| dOut5 = max_pool2d_builtin::backward(dOut6_5,14,14,out4,32,28,28,2,2,2,2,0,0); dOut5_4 = dOut5;
+071| dOut4 = relu::backward(dOut5_4,out3); dOut4_3 = dOut4;
+072| [dOut3,conv1_dWeight,conv1_dBias] = conv2d_builtin::backward(dOut4_3,28,28,Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2); dOut3_2 = dOut3;
+073| # Update the parameters
+074| conv1_dWeight_reg = l2_reg::backward(conv1_weight, 5.000000237487257E-4)
+075| conv1_dWeight = conv1_dWeight + conv1_dWeight_reg
+076| [conv1_weight,conv1_weight_v] = sgd_momentum::update(conv1_weight,conv1_dWeight,(lr * 1.0),0.8999999761581421,conv1_weight_v)
+077| [conv1_bias,conv1_bias_v] = sgd_momentum::update(conv1_bias,conv1_dBias,(lr * 2.0),0.8999999761581421,conv1_bias_v)
+078| conv2_dWeight_reg = l2_reg::backward(conv2_weight, 5.000000237487257E-4)
+079| conv2_dWeight = conv2_dWeight + conv2_dWeight_reg
+080| [conv2_weight,conv2_weight_v] = sgd_momentum::update(conv2_weight,conv2_dWeight,(lr * 1.0),0.8999999761581421,conv2_weight_v)
+081| [conv2_bias,conv2_bias_v] = sgd_momentum::update(conv2_bias,conv2_dBias,(lr * 2.0),0.8999999761581421,conv2_bias_v)
+082| ip1_dWeight_reg = l2_reg::backward(ip1_weight, 5.000000237487257E-4)
+083| ip1_dWeight = ip1_dWeight + ip1_dWeight_reg
+084| [ip1_weight,ip1_weight_v] = sgd_momentum::update(ip1_weight,ip1_dWeight,(lr * 1.0),0.8999999761581421,ip1_weight_v)
+085| [ip1_bias,ip1_bias_v] = sgd_momentum::update(ip1_bias,ip1_dBias,(lr * 2.0),0.8999999761581421,ip1_bias_v)
+086| ip2_dWeight_reg = l2_reg::backward(ip2_weight, 5.000000237487257E-4)
+087| ip2_dWeight = ip2_dWeight + ip2_dWeight_reg
+088| [ip2_weight,ip2_weight_v] = sgd_momentum::update(ip2_weight,ip2_dWeight,(lr * 1.0),0.8999999761581421,ip2_weight_v)
+089| [ip2_bias,ip2_bias_v] = sgd_momentum::update(ip2_bias,ip2_dBias,(lr * 2.0),0.8999999761581421,ip2_bias_v)
+090| # Compute training loss & accuracy
+091| if(iter %% 100 == 0) {
+092| loss = 0
+093| accuracy = 0
+094| tmp_loss = cross_entropy_loss::forward(out13,yb)
+095| loss = loss + tmp_loss
+096| true_yb = rowIndexMax(yb)
+097| predicted_yb = rowIndexMax(out13)
+098| accuracy = mean(predicted_yb == true_yb)*100
+099| training_loss = loss
+100| training_accuracy = accuracy
+101| print("Iter:" + iter + ", training loss:" + training_loss + ", training accuracy:" + training_accuracy)
+102| if(debug) {
+103| num_rows_error_measures = min(10, ncol(yb))
+104| error_measures = matrix(0, rows=num_rows_error_measures, cols=5)
+105| for(class_i in 1:num_rows_error_measures) {
+106| tp = sum( (true_yb == predicted_yb) * (true_yb == class_i) )
+107| tp_plus_fp = sum( (predicted_yb == class_i) )
+108| tp_plus_fn = sum( (true_yb == class_i) )
+109| precision = tp / tp_plus_fp
+110| recall = tp / tp_plus_fn
+111| f1Score = 2*precision*recall / (precision+recall)
+112| error_measures[class_i,1] = class_i
+113| error_measures[class_i,2] = precision
+114| error_measures[class_i,3] = recall
+115| error_measures[class_i,4] = f1Score
+116| error_measures[class_i,5] = tp_plus_fn
+117| }
+118| print("class \tprecision\trecall \tf1-score\tnum_true_labels\n" + toString(error_measures, decimal=7, sep="\t"))
+119| }
+120| }
+121| # Compute validation loss & accuracy
+122| if(iter %% 500 == 0) {
+123| loss = 0
+124| accuracy = 0
+125| validation_loss = 0
+126| validation_accuracy = 0
+127| for(iVal in 1:num_iters_per_epoch) {
+128| beg = ((iVal-1) * BATCH_SIZE) %% num_validation + 1; end = min(beg + BATCH_SIZE - 1, num_validation); Xb = X_val[beg:end,]; yb = y_val[beg:end,];
+129| # Perform forward pass
+130| [out3,ignoreHout_3,ignoreWout_3] = conv2d_builtin::forward(Xb,conv1_weight,conv1_bias,1,28,28,5,5,1,1,2,2)
+131| out4 = relu::forward(out3)
+132| [out5,ignoreHout_5,ignoreWout_5] = max_pool2d_builtin::forward(out4,32,28,28,2,2,2,2,0,0)
+133| [out6,ignoreHout_6,ignoreWout_6] = conv2d_builtin::forward(out5,conv2_weight,conv2_bias,32,14,14,5,5,1,1,2,2)
+134| out7 = relu::forward(out6)
+135| [out8,ignoreHout_8,ignoreWout_8] = max_pool2d_builtin::forward(out7,64,14,14,2,2,2,2,0,0)
+136| out9 = affine::forward(out8,ip1_weight,ip1_bias)
+137| out10 = relu::forward(out9)
+138| [out11,mask11] = dropout::forward(out10,0.5,-1)
+139| out12 = affine::forward(out11,ip2_weight,ip2_bias)
+140| out13 = softmax::forward(out12)
+141| tmp_loss = cross_entropy_loss::forward(out13,yb)
+142| loss = loss + tmp_loss
+143| true_yb = rowIndexMax(yb)
+144| predicted_yb = rowIndexMax(out13)
+145| accuracy = mean(predicted_yb == true_yb)*100
+146| validation_loss = validation_loss + loss
+147| validation_accuracy = validation_accuracy + accuracy
+148| }
+149| validation_accuracy = validation_accuracy / num_iters_per_epoch
+150| print("Iter:" + iter + ", validation loss:" + validation_loss + ", validation accuracy:" + validation_accuracy)
+151| }
+152| }
+153| # Learning rate
+154| lr = (0.009999999776482582 * 0.949999988079071^e)
+155|}
+
+Iter:100, training loss:0.24014199350958168, training accuracy:87.5
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+3.0000000 0.8888889 0.8888889 0.8888889 9.0000000
+4.0000000 0.7500000 0.7500000 0.7500000 4.0000000
+5.0000000 0.7500000 1.0000000 0.8571429 3.0000000
+6.0000000 0.8333333 1.0000000 0.9090909 5.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+8.0000000 0.8571429 0.7500000 0.8000000 8.0000000
+9.0000000 1.0000000 0.5714286 0.7272727 7.0000000
+10.0000000 0.7272727 0.8888889 0.8000000 9.0000000
+
+Iter:200, training loss:0.09555593867171894, training accuracy:98.4375
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+7.0000000 1.0000000 0.6666667 0.8000000 3.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+9.0000000 0.8571429 1.0000000 0.9230769 6.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+
+Iter:300, training loss:0.058686794512570216, training accuracy:98.4375
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+6.0000000 1.0000000 0.8750000 0.9333333 8.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 2.0000000
+9.0000000 0.8888889 1.0000000 0.9411765 8.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+
+Iter:400, training loss:0.08742103541529415, training accuracy:96.875
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+2.0000000 0.8000000 1.0000000 0.8888889 8.0000000
+3.0000000 1.0000000 0.8333333 0.9090909 6.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+10.0000000 1.0000000 0.9230769 0.9600000 13.0000000
+
+Iter:500, training loss:0.05873836245880005, training accuracy:98.4375
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+6.0000000 1.0000000 0.8571429 0.9230769 7.0000000
+7.0000000 0.8571429 1.0000000 0.9230769 6.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+
+Iter:500, validation loss:260.1580978627665, validation accuracy:96.43954918032787
+Iter:600, training loss:0.07584116043829209, training accuracy:98.4375
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+8.0000000 1.0000000 0.9230769 0.9600000 13.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+10.0000000 0.8333333 1.0000000 0.9090909 5.0000000
+
+Iter:700, training loss:0.07973166944626336, training accuracy:98.4375
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+8.0000000 0.8000000 1.0000000 0.8888889 4.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+10.0000000 1.0000000 0.9166667 0.9565217 12.0000000
+
+Iter:800, training loss:0.0063778595034221855, training accuracy:100.0
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 2.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+
+Iter:900, training loss:0.019673112167879484, training accuracy:100.0
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 12.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+
+Iter:1000, training loss:0.06137978002508307, training accuracy:96.875
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 8.0000000
+4.0000000 0.8333333 0.8333333 0.8333333 6.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 5.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 3.0000000
+8.0000000 0.8888889 0.8888889 0.8888889 9.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+
+Iter:1000, validation loss:238.62301345198944, validation accuracy:97.02868852459017
+Iter:1100, training loss:0.023325103696013115, training accuracy:100.0
+class precision recall f1-score num_true_labels
+1.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+2.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+3.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+4.0000000 1.0000000 1.0000000 1.0000000 4.0000000
+5.0000000 1.0000000 1.0000000 1.0000000 2.0000000
+6.0000000 1.0000000 1.0000000 1.0000000 10.0000000
+7.0000000 1.0000000 1.0000000 1.0000000 7.0000000
+8.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+9.0000000 1.0000000 1.0000000 1.0000000 9.0000000
+10.0000000 1.0000000 1.0000000 1.0000000 6.0000000
+...
+```
+
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/layers/l2_loss.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/l2_loss.dml b/scripts/nn/layers/l2_loss.dml
index 0482f25..67b9870 100644
--- a/scripts/nn/layers/l2_loss.dml
+++ b/scripts/nn/layers/l2_loss.dml
@@ -69,4 +69,3 @@ backward = function(matrix[double] pred, matrix[double] y)
N = nrow(y)
dpred = (pred-y) / N
}
-
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/layers/tanh.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/layers/tanh.dml b/scripts/nn/layers/tanh.dml
index d849d70..23fd106 100644
--- a/scripts/nn/layers/tanh.dml
+++ b/scripts/nn/layers/tanh.dml
@@ -29,10 +29,9 @@ forward = function(matrix[double] X)
/*
* Computes the forward pass for a tanh nonlinearity layer.
*
- * ```
- * tanh(x) = (e^x - e^-x) / (e^x + e^-x)
- * = 2 * sigmoid(2x) - 1
- * ```
+ * ```
+ * tanh(x) = (e^x - e^-x) / (e^x + e^-x)
+ * ```
*
* Inputs:
* - X: Inputs, of shape (any, any).
@@ -40,10 +39,7 @@ forward = function(matrix[double] X)
* Outputs:
* - out: Outputs, of same shape as `X`.
*/
- # out = (exp(X) - exp(-X)) / (exp(X) + exp(-X))
- # Simplification of the above formulation to use the sigmoid function:
- sigma2X = sigmoid::forward(2*X)
- out = 2*sigma2X - 1
+ out = tanh(X)
}
backward = function(matrix[double] dout, matrix[double] X)
@@ -58,8 +54,7 @@ backward = function(matrix[double] dout, matrix[double] X)
* Outputs:
* - dX: Gradient wrt `X`, of same shape as `X`.
*/
- sigma2X = sigmoid::forward(2*X)
- out = 2*sigma2X - 1
+ out = tanh(X)
dX = (1-out^2) * dout
}
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/test/run_tests.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/test/run_tests.dml b/scripts/nn/test/run_tests.dml
index 0f42816..27d6a4a 100644
--- a/scripts/nn/test/run_tests.dml
+++ b/scripts/nn/test/run_tests.dml
@@ -105,6 +105,8 @@ test::top_k_row()
test::top_k()
test::top_k2d()
test::softmax2d()
+test::compare_tanh_builtin_forward_with_old()
+test::compare_tanh_builtin_backward_with_old()
print("---")
print("Other tests complete -- look for any ERRORs or WARNINGs.")
http://git-wip-us.apache.org/repos/asf/systemml/blob/61dcc85e/scripts/nn/test/test.dml
----------------------------------------------------------------------
diff --git a/scripts/nn/test/test.dml b/scripts/nn/test/test.dml
index 06f4632..2a04f97 100644
--- a/scripts/nn/test/test.dml
+++ b/scripts/nn/test/test.dml
@@ -39,6 +39,7 @@ source("nn/test/conv2d_simple.dml") as conv2d_simple
source("nn/test/max_pool2d_simple.dml") as max_pool2d_simple
source("nn/test/util.dml") as test_util
source("nn/util.dml") as util
+source("nn/layers/sigmoid.dml") as sigmoid
batch_norm1d = function() {
/*
@@ -825,6 +826,58 @@ tanh = function() {
}
}
+compare_tanh_builtin_forward_with_old = function() {
+ /*
+ * Test for the `tanh` forward function.
+ */
+ print("Testing the tanh forward function.")
+
+ # Generate data
+ N = 2 # num examples
+ C = 3 # num channels
+ X = rand(rows=N, cols=C, pdf="normal")
+
+ out = tanh::forward(X)
+
+ sigma2X = sigmoid::forward(2*X)
+ out_ref = 2*sigma2X - 1
+
+ # Equivalency check
+ for (i in 1:nrow(out)) {
+ for (j in 1:ncol(out)) {
+ rel_error = test_util::check_rel_error(as.scalar(out[i,j]), as.scalar(out_ref[i,j]),
+ 1e-10, 1e-12)
+ }
+ }
+}
+
+compare_tanh_builtin_backward_with_old = function() {
+ /*
+ * Test for the `tanh` forward function.
+ */
+ print("Testing the tanh forward function.")
+
+ # Generate data
+ N = 2 # num examples
+ C = 3 # num channels
+ X = rand(rows=N, cols=C, pdf="normal")
+ dout = rand(rows=N, cols=C, pdf="normal")
+
+ sigma2X = sigmoid::forward(2*X)
+ out = 2*sigma2X - 1
+ out_ref = (1-out^2) * dout
+
+ out = tanh::backward(dout, X)
+
+ # Equivalency check
+ for (i in 1:nrow(out)) {
+ for (j in 1:ncol(out)) {
+ rel_error = test_util::check_rel_error(as.scalar(out[i,j]), as.scalar(out_ref[i,j]),
+ 1e-10, 1e-12)
+ }
+ }
+}
+
threshold = function() {
/*
* Test for the threshold function.