You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@systemml.apache.org by ni...@apache.org on 2019/03/27 15:58:53 UTC

[systemml] branch gh-pages updated (878f757 -> 47ce217)

This is an automated email from the ASF dual-hosted git repository.

niketanpansare pushed a change to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/systemml.git.


    from 878f757  [SYSTEMML-540] Added ternary aggregate operators for GPU backend
     new 9deb19c  [SYSTEMML-540] Added looped_minibatch training algorithm in Keras2DML
     new 47ce217  [SYSTEMML-540] Updated Keras2DML to match Keras API and improved rmvar performance

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 beginners-guide-caffe2dml.md                       |   2 +-
 beginners-guide-keras2dml.md                       | 132 ++++------------
 gpu.md                                             |  12 +-
 index.md                                           |   4 +-
 reference-guide-caffe2dml.md                       |  68 ++++++++
 ...de-keras2dml.md => reference-guide-keras2dml.md | 171 +++++++--------------
 6 files changed, 165 insertions(+), 224 deletions(-)
 copy beginners-guide-keras2dml.md => reference-guide-keras2dml.md (55%)


[systemml] 01/02: [SYSTEMML-540] Added looped_minibatch training algorithm in Keras2DML

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

niketanpansare pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/systemml.git

commit 9deb19ca8092b20a4cebcb9bdbc91fb444b1918b
Author: Niketan Pansare <np...@us.ibm.com>
AuthorDate: Mon Mar 25 12:33:50 2019 -0700

    [SYSTEMML-540] Added looped_minibatch training algorithm in Keras2DML
    
    - This algorithm performs multiple forward-backward passes (=`parallel_batches` parameters) with the given batch size, aggregate gradients and finally updates the model.
    - Updated the documentation.
---
 beginners-guide-caffe2dml.md |  2 +-
 beginners-guide-keras2dml.md | 35 ++++++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/beginners-guide-caffe2dml.md b/beginners-guide-caffe2dml.md
index 8814283..db74feb 100644
--- a/beginners-guide-caffe2dml.md
+++ b/beginners-guide-caffe2dml.md
@@ -161,7 +161,7 @@ Iter:2000, validation loss:173.66147359346, validation accuracy:97.4897540983606
 
 Unlike Caffe where default train and test algorithm is `minibatch`, you can specify the
 algorithm using the parameters `train_algo` and `test_algo` (valid values are: `minibatch`, `allreduce_parallel_batches`, 
-and `allreduce`). Here are some common settings:
+`looped_minibatch`, and `allreduce`). Here are some common settings:
 
 |                                                                          | PySpark script                                                                                                                           | Changes to Network/Solver                                              |
 |--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
diff --git a/beginners-guide-keras2dml.md b/beginners-guide-keras2dml.md
index 4517be5..2259397 100644
--- a/beginners-guide-keras2dml.md
+++ b/beginners-guide-keras2dml.md
@@ -208,4 +208,37 @@ For example: for the expression `Keras2DML(..., display=100, test_iter=10, test_
 To verify that Keras2DML produce same results as other Keras' backend, we have [Python unit tests](https://github.com/apache/systemml/blob/master/src/main/python/tests/test_nn_numpy.py)
 that compare the results of Keras2DML with that of TensorFlow. We assume that Keras team ensure that all their backends are consistent with their TensorFlow backend.
 
-
+#### How can I train very deep models on GPU?
+
+Unlike Keras where default train and test algorithm is `minibatch`, you can specify the
+algorithm using the parameters `train_algo` and `test_algo` (valid values are: `minibatch`, `allreduce_parallel_batches`, 
+`looped_minibatch`, and `allreduce`). Here are some common settings:
+
+|                                                                          | PySpark script                                                                                                                           | Changes to Network/Solver                                              |
+|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
+| Single-node CPU execution (similar to Caffe with solver_mode: CPU)       | `lenet.set(train_algo="minibatch", test_algo="minibatch")`                                                                               | Ensure that `batch_size` is set to appropriate value (for example: 64) |
+| Single-node single-GPU execution                                         | `lenet.set(train_algo="minibatch", test_algo="minibatch").setGPU(True).setForceGPU(True)`                                                | Ensure that `batch_size` is set to appropriate value (for example: 64) |
+| Single-node multi-GPU execution (similar to Caffe with solver_mode: GPU) | `lenet.set(train_algo="allreduce_parallel_batches", test_algo="minibatch", parallel_batches=num_gpu).setGPU(True).setForceGPU(True)`     | Ensure that `batch_size` is set to appropriate value (for example: 64) |
+| Distributed prediction                                                   | `lenet.set(test_algo="allreduce")`                                                                                                       |                                                                        |
+| Distributed synchronous training                                         | `lenet.set(train_algo="allreduce_parallel_batches", parallel_batches=num_cluster_cores)`                                                 | Ensure that `batch_size` is set to appropriate value (for example: 64) |
+
+Here are high-level guidelines to train very deep models on GPU with Keras2DML (and Caffe2DML):
+
+1. If there exists at least one layer/operator that does not fit on the device, please allow SystemML's optimizer to perform operator placement based on the memory estimates `sysml_model.setGPU(True)`.
+2. If each individual layer/operator fits on the device but not the entire network with a batch size of 1, then 
+- Rely on SystemML's GPU Memory Manager to perform automatic eviction (recommended): `sysml_model.setGPU(True) # Optional: .setForceGPU(True)`
+- Or enable Nvidia's Unified Memory:  `sysml_model.setConfigProperty('sysml.gpu.memory.allocator', 'unified_memory')`
+3. If the entire neural network does not fit in the GPU memory with the user-specified `batch_size`, but fits in the GPU memory with `local_batch_size` such that `1 << local_batch_size < batch_size`, then
+- Use either of the above two options.
+- Or enable `train_algo` that performs multiple forward-backward pass with batch size `local_batch_size`, aggregate gradients and finally updates the model: 
+```python
+sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
+sysml_model.set(train_algo="looped_minibatch", parallel_batches=int(batch_size/local_batch_size))
+sysml_model.setGPU(True).setForceGPU(True)
+```
+- Or add `int(batch_size/local_batch_size)` GPUs and perform single-node multi-GPU training with batch size `local_batch_size`:
+```python
+sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
+sysml_model.set(train_algo="allreduce_parallel_batches", parallel_batches=int(batch_size/local_batch_size))
+sysml_model.setGPU(True).setForceGPU(True)
+```


[systemml] 02/02: [SYSTEMML-540] Updated Keras2DML to match Keras API and improved rmvar performance

Posted by ni...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

niketanpansare pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/systemml.git

commit 47ce2179768b9dd2821aae7fc11f11c1fb8de210
Author: Niketan Pansare <np...@us.ibm.com>
AuthorDate: Wed Mar 27 08:57:02 2019 -0700

    [SYSTEMML-540] Updated Keras2DML to match Keras API and improved rmvar performance
    
    - Improved performance of rmvar by refactoring LocalVariableMap. With this change, the end-to-end performance of a sample run of ResNet-200 improved from 55 seconds to 31 seconds.
    - The parameters batch_size, max_iter, test_iter, test_interval, display of Keras2DML constructor is removed. Instead, batch_size, epochs, validation_split, validation_data parameters of fit() method.
    - Updated the Caffe2DML generator to include the above parameters.
    - Updated the documentation.
    
    Closes #859.
---
 beginners-guide-keras2dml.md                       | 165 ++++-----------------
 gpu.md                                             |  12 +-
 index.md                                           |   4 +-
 reference-guide-caffe2dml.md                       |  68 +++++++++
 ...de-keras2dml.md => reference-guide-keras2dml.md | 142 +++---------------
 5 files changed, 133 insertions(+), 258 deletions(-)

diff --git a/beginners-guide-keras2dml.md b/beginners-guide-keras2dml.md
index 2259397..788a489 100644
--- a/beginners-guide-keras2dml.md
+++ b/beginners-guide-keras2dml.md
@@ -27,34 +27,23 @@ limitations under the License.
 
 <br/>
 
-## Introduction
+# Introduction
 
-Keras2DML is an **experimental API** that converts a Keras specification to DML through the intermediate Caffe2DML module. 
+Keras2DML converts a Keras specification to DML through the intermediate Caffe2DML module. 
 It is designed to fit well into the mllearn framework and hence supports NumPy, Pandas as well as PySpark DataFrame.
 
-### Getting Started 
-
-To create a Keras2DML object, one needs to create a Keras model through the Funcitonal API. please see the [Functional API.](https://keras.io/models/model/)
-This module utilizes the existing [Caffe2DML](beginners-guide-caffe2dml) backend to convert Keras models into DML. Keras models are 
-parsed and translated into Caffe prototext and caffemodel files which are then piped into Caffe2DML. Thus one can follow the Caffe2DML
-documentation for further information.
-
-### Model Conversion
-
-Keras models are parsed based on their layer structure and corresponding weights and translated into the relative Caffe layer and weight
-configuration. Be aware that currently this is a translation into Caffe and there will be loss of information from keras models such as 
-intializer information, and other layers which do not exist in Caffe. 
-
 First, install SystemML and other dependencies for the below demo:
 
 ```
-pip install systemml keras tensorflow mlxtend
+pip install systemml keras tensorflow
 ``` 
 
 To create a Keras2DML object, simply pass the keras object to the Keras2DML constructor. It's also important to note that your models
 should be compiled so that the loss can be accessed for Caffe2DML.
 
+# Training Lenet on the MNIST dataset
 
+Download the MNIST dataset using [mlxtend package](https://pypi.python.org/pypi/mlxtend).
 
 ```python
 # pyspark --driver-memory 20g
@@ -115,130 +104,34 @@ sysml_model.fit(X_train, y_train)
 sysml_model.score(X_test, y_test)
 ```
 
-# Frequently asked questions
-
-#### How can I get the training and prediction DML script for the Keras model?
-
-The training and prediction DML scripts can be generated using `get_training_script()` and `get_prediction_script()` methods.
+# Prediction using a pretrained ResNet-50
 
 ```python
-from systemml.mllearn import Keras2DML
-sysml_model = Keras2DML(spark, keras_model, input_shape=(3,224,224))
-print(sysml_model.get_training_script())
-```
-
-#### What is the mapping between Keras' parameters and Caffe's solver specification ? 
-
-|                                                        | Specified via the given parameter in the Keras2DML constructor | From input Keras' model                                                                 | Corresponding parameter in the Caffe solver file |
-|--------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------|
-| Solver type                                            |                                                                | `type(keras_model.optimizer)`. Supported types: `keras.optimizers.{SGD, Adagrad, Adam}` | `type`                                           |
-| Maximum number of iterations                           | `max_iter`                                                     | The `epoch` parameter in the `fit` method is not supported.                             | `max_iter`                                       |
-| Validation dataset                                     | `test_iter` (explained in the below section)                   | The `validation_data` parameter in the `fit` method is not supported.                   | `test_iter`                                      |
-| Monitoring the loss                                    | `display, test_interval` (explained in the below section)      | The `LossHistory` callback in the `fit` method is not supported.                        | `display, test_interval`                         |
-| Learning rate schedule                                 | `lr_policy`                                                    | The `LearningRateScheduler` callback in the `fit` method is not supported.              | `lr_policy` (default: step)                      |
-| Base learning rate                                     |                                                                | `keras_model.optimizer.lr`                                                              | `base_lr`                                        |
-| Learning rate decay over each update                   |                                                                | `keras_model.optimizer.decay`                                                           | `gamma`                                          |
-| Global regularizer to use for all layers               | `regularization_type,weight_decay`                             | The current version of Keras2DML doesnot support custom regularizers per layer.         | `regularization_type,weight_decay`               |
-| If type of the optimizer is `keras.optimizers.SGD`     |                                                                | `momentum, nesterov`                                                                    | `momentum, type`                                 |
-| If type of the optimizer is `keras.optimizers.Adam`    |                                                                | `beta_1, beta_2, epsilon`. The parameter `amsgrad` is not supported.                    | `momentum, momentum2, delta`                     |
-| If type of the optimizer is `keras.optimizers.Adagrad` |                                                                | `epsilon`                                                                               | `delta`                                          |
-
-#### How do I specify the batch size and the number of epochs ?
+# pyspark --driver-memory 20g
+# Disable Tensorflow from using GPU to avoid unnecessary evictions by SystemML runtime
+import os
+os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
+os.environ['CUDA_VISIBLE_DEVICES'] = ''
 
-Since Keras2DML is a mllearn API, it doesnot accept the batch size and number of epochs as the parameter in the `fit` method.
-Instead, these parameters are passed via `batch_size` and `max_iter` parameters in the Keras2DML constructor.
-For example, the equivalent Python code for `keras_model.fit(features, labels, epochs=10, batch_size=64)` is as follows:
+# Set channel first layer
+from keras import backend as K
+K.set_image_data_format('channels_first')
 
-```python
 from systemml.mllearn import Keras2DML
-epochs = 10
-batch_size = 64
-num_samples = features.shape[0]
-max_iter = int(epochs*math.ceil(num_samples/batch_size))
-sysml_model = Keras2DML(spark, keras_model, batch_size=batch_size, max_iter=max_iter, ...)
-sysml_model.fit(features, labels)
-``` 
-
-#### What optimizer and loss does Keras2DML use by default if `keras_model` is not compiled ?
-
-If the user does not `compile` the keras model, then we throw an error.
-
-For classification applications, you can consider using cross entropy loss and SGD optimizer with nesterov momentum:
-
-```python 
-keras_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.95, decay=5e-4, nesterov=True))
+import systemml as sml
+import keras, urllib
+from PIL import Image
+from keras.applications.resnet50 import preprocess_input, decode_predictions, ResNet50
+
+keras_model = ResNet50(weights='imagenet',include_top=True,pooling='None',input_shape=(3,224,224))
+keras_model.compile(optimizer='sgd', loss= 'categorical_crossentropy')
+
+sysml_model = Keras2DML(spark,keras_model,input_shape=(3,224,224), weights='weights_dir', labels='https://raw.githubusercontent.com/apache/systemml/master/scripts/nn/examples/caffe2dml/models/imagenet/labels.txt')
+sysml_model.summary()
+urllib.urlretrieve('https://upload.wikimedia.org/wikipedia/commons/f/f4/Cougar_sitting.jpg', 'test.jpg')
+img_shape = (3, 224, 224)
+input_image = sml.convertImageToNumPyArr(Image.open('test.jpg'), img_shape=img_shape)
+sysml_model.predict(input_image)
 ```
 
-Please refer to [Keras's documentation](https://keras.io/losses/) for more detail.
-
-#### What is the learning rate schedule used ?
-
-Keras2DML does not support the `LearningRateScheduler` callback. 
-Instead one can set the custom learning rate schedule to one of the following schedules by using the `lr_policy` parameter of the constructor:
-- `step`: return `base_lr * gamma ^ (floor(iter / step))` (default schedule)
-- `fixed`: always return `base_lr`.
-- `exp`: return `base_lr * gamma ^ iter`
-- `inv`: return `base_lr * (1 + gamma * iter) ^ (- power)`
-- `poly`: the effective learning rate follows a polynomial decay, to be zero by the max_iter. return `base_lr (1 - iter/max_iter) ^ (power)`
-- `sigmoid`: the effective learning rate follows a sigmod decay return b`ase_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))`
-
-#### How to set the size of the validation dataset ?
-
-The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and 
-`test_iter` is set to 10 in the `Keras2DML`'s constructor, then the validation size is 640. This setting generates following DML code internally:
-
-```python
-num_images = nrow(y_full)
-BATCH_SIZE = 64
-num_validation = 10 * BATCH_SIZE
-X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]
-X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
-num_images = nrow(y)
-``` 
-
-#### How to monitor loss via command-line ?
-
-To monitor loss, please set the parameters `display`, `test_iter` and `test_interval` in the `Keras2DML`'s constructor.  
-For example: for the expression `Keras2DML(..., display=100, test_iter=10, test_interval=500)`, we
-- display the training loss and accuracy every 100 iterations and
-- carry out validation every 500 training iterations and display validation loss and accuracy.
-
-#### How do you ensure that Keras2DML produce same results as other Keras' backend?
-
-To verify that Keras2DML produce same results as other Keras' backend, we have [Python unit tests](https://github.com/apache/systemml/blob/master/src/main/python/tests/test_nn_numpy.py)
-that compare the results of Keras2DML with that of TensorFlow. We assume that Keras team ensure that all their backends are consistent with their TensorFlow backend.
-
-#### How can I train very deep models on GPU?
-
-Unlike Keras where default train and test algorithm is `minibatch`, you can specify the
-algorithm using the parameters `train_algo` and `test_algo` (valid values are: `minibatch`, `allreduce_parallel_batches`, 
-`looped_minibatch`, and `allreduce`). Here are some common settings:
-
-|                                                                          | PySpark script                                                                                                                           | Changes to Network/Solver                                              |
-|--------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|
-| Single-node CPU execution (similar to Caffe with solver_mode: CPU)       | `lenet.set(train_algo="minibatch", test_algo="minibatch")`                                                                               | Ensure that `batch_size` is set to appropriate value (for example: 64) |
-| Single-node single-GPU execution                                         | `lenet.set(train_algo="minibatch", test_algo="minibatch").setGPU(True).setForceGPU(True)`                                                | Ensure that `batch_size` is set to appropriate value (for example: 64) |
-| Single-node multi-GPU execution (similar to Caffe with solver_mode: GPU) | `lenet.set(train_algo="allreduce_parallel_batches", test_algo="minibatch", parallel_batches=num_gpu).setGPU(True).setForceGPU(True)`     | Ensure that `batch_size` is set to appropriate value (for example: 64) |
-| Distributed prediction                                                   | `lenet.set(test_algo="allreduce")`                                                                                                       |                                                                        |
-| Distributed synchronous training                                         | `lenet.set(train_algo="allreduce_parallel_batches", parallel_batches=num_cluster_cores)`                                                 | Ensure that `batch_size` is set to appropriate value (for example: 64) |
-
-Here are high-level guidelines to train very deep models on GPU with Keras2DML (and Caffe2DML):
-
-1. If there exists at least one layer/operator that does not fit on the device, please allow SystemML's optimizer to perform operator placement based on the memory estimates `sysml_model.setGPU(True)`.
-2. If each individual layer/operator fits on the device but not the entire network with a batch size of 1, then 
-- Rely on SystemML's GPU Memory Manager to perform automatic eviction (recommended): `sysml_model.setGPU(True) # Optional: .setForceGPU(True)`
-- Or enable Nvidia's Unified Memory:  `sysml_model.setConfigProperty('sysml.gpu.memory.allocator', 'unified_memory')`
-3. If the entire neural network does not fit in the GPU memory with the user-specified `batch_size`, but fits in the GPU memory with `local_batch_size` such that `1 << local_batch_size < batch_size`, then
-- Use either of the above two options.
-- Or enable `train_algo` that performs multiple forward-backward pass with batch size `local_batch_size`, aggregate gradients and finally updates the model: 
-```python
-sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
-sysml_model.set(train_algo="looped_minibatch", parallel_batches=int(batch_size/local_batch_size))
-sysml_model.setGPU(True).setForceGPU(True)
-```
-- Or add `int(batch_size/local_batch_size)` GPUs and perform single-node multi-GPU training with batch size `local_batch_size`:
-```python
-sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
-sysml_model.set(train_algo="allreduce_parallel_batches", parallel_batches=int(batch_size/local_batch_size))
-sysml_model.setGPU(True).setForceGPU(True)
-```
+Please see [Keras2DML's reference guide](http://apache.github.io/systemml/reference-guide-keras2dml) for more details.
diff --git a/gpu.md b/gpu.md
index f334b47..c5cdb56 100644
--- a/gpu.md
+++ b/gpu.md
@@ -185,6 +185,16 @@ $ ./bin/x86_64/linux/release/deviceQuery
 $ ./bin/x86_64/linux/release/bandwidthTest 
 $ ./bin/x86_64/linux/release/matrixMulCUBLAS 
 ```
+- Test CUDA and CuDNN with SystemML
+```
+$ git clone https://github.com/apache/systemml.git
+$ cd systemml
+$ mvn -Dit.test=org.apache.sysml.test.gpu.AggregateTernaryTests verify -PgpuTests
+$ mvn -Dit.test=org.apache.sysml.test.gpu.NeuralNetworkOpTests verify -PgpuTests
+```
+
+If you get an `java.lang.UnsatisfiedLinkError: libcusparse.so.9.0: cannot open shared object file: No such file or directory` error, then
+CUDA toolkit is not installed correctly or it is not included in the `LD_LIBRARY_PATH`.
 
 ### How to install CUDA 9 on Centos 7 with yum?
 
@@ -211,4 +221,4 @@ cd gcc-5.3.0
 num_cores=`grep -c ^processor /proc/cpuinfo`
 make -j $num_cores
 sudo make install
-```
+```
\ No newline at end of file
diff --git a/index.md b/index.md
index 4ceaee6..3169b15 100644
--- a/index.md
+++ b/index.md
@@ -76,8 +76,8 @@ for running SystemML from Spark via Scala, Python, or Java.
 machine in R-like and Python-like declarative languages.
 * [JMLC](jmlc) - Java Machine Learning Connector.
 * [Deep Learning with SystemML](deep-learning)
-  * *Experimental* Caffe2DML API for Deep Learning ([beginner's guide](beginners-guide-caffe2dml), [reference guide](reference-guide-caffe2dml)) - Converts a Caffe specification to DML.
-  * *Experimental* [Keras2DML API](beginners-guide-keras2dml) for Deep Learning.
+  * Keras2DML API for Deep Learning ([beginner's guide](beginners-guide-keras2dml), [reference guide](reference-guide-keras2dml)) - Converts a Keras model to DML.
+  * Caffe2DML API for Deep Learning ([beginner's guide](beginners-guide-caffe2dml), [reference guide](reference-guide-caffe2dml)) - Converts a Caffe specification to DML.
 
 ## Language Guides
 
diff --git a/reference-guide-caffe2dml.md b/reference-guide-caffe2dml.md
index 6242e03..1a3d154 100644
--- a/reference-guide-caffe2dml.md
+++ b/reference-guide-caffe2dml.md
@@ -1135,3 +1135,71 @@ class           precision       recall          f1-score        num_true_labels
 ...
 ```
 
+#### Design document of Caffe2DML
+
+1. Caffe2DML is designed to fit well into the mllearn framework. Hence, the key methods that were to be implemented are:
+- `getTrainingScript` for the `Estimator` class.
+- `getPredictionScript` for the `Model` class.
+
+These methods should be the starting point of any developer to understand the DML generated for training and prediction respectively.
+
+2. To simplify the DML generation in `getTrainingScript` and `getPredictionScript method`, we use DMLGenerator interface. 
+This interface generates DML string for common operations such as loops (such as if, for, while) as well as built-in functions (read, write), etc. 
+Also, this interface helps in "code reading" of the Caffe2DML class.
+
+3. Here is an analogy for SystemML developers to think of various moving components of Caffe2DML:
+- Like `Dml.g4` in the `org.apache.sysml.parser.dml` package, `caffe.proto` in the `src/main/proto/caffe` directory
+is used to generate classes to parse the input files.
+
+```
+Dml.g4      ---> antlr  ---> DmlLexer.java, DmlListener.java, DmlParser.java
+caffe.proto ---> protoc ---> target/generated-sources/caffe/Caffe.java
+```
+
+- Just like the classes generated by Dml.g4 are used to parse input DML file,
+the `target/generated-sources/caffe/Caffe.java` class is used to parse the input caffe network/deploy prototxt and solver files.
+
+- You can think of `.caffemodel` file as DML file with matrix values encoded in it (please see below example).
+So it is possible to read `.caffemodel` file with the `Caffe.java` class. This is done in Utils.scala's `readCaffeNet` method.
+
+```
+X = matrix("1.2 3.5 0.999 7.123", rows=2, cols=2)
+...
+```
+
+- Just like we convert the AST generated by antlr into our DMLProgram representation, we convert
+caffe's abstraction into the below given mapping classes for layer, solver and learning rate.
+These mapping classes maps the corresponding Caffe abstraction to the SystemML-NN library.
+This greatly simplifies adding new layers into Caffe2DML:
+```
+trait CaffeLayer {
+  // Any layer that wants to reuse SystemML-NN has to override following methods that help in generating the DML for the given layer:
+  def sourceFileName:String;
+  def init(dmlScript:StringBuilder):Unit;
+  def forward(dmlScript:StringBuilder, isPrediction:Boolean):Unit;
+  def backward(dmlScript:StringBuilder, outSuffix:String):Unit;
+  ...
+}
+trait CaffeSolver {
+  def sourceFileName:String;
+  def update(dmlScript:StringBuilder, layer:CaffeLayer):Unit;
+  def init(dmlScript:StringBuilder, layer:CaffeLayer):Unit;
+}
+```
+
+4. To simplify the traversal of the network, we created a Network interface:
+```
+trait Network {
+  def getLayers(): List[String]
+  def getCaffeLayer(layerName:String):CaffeLayer
+  def getBottomLayers(layerName:String): Set[String]
+  def getTopLayers(layerName:String): Set[String]
+  def getLayerID(layerName:String): Int
+}
+```
+
+5. One of the key design restriction of Caffe2DML is that every layer is identified uniquely by its name.
+This restriction simplifies the code significantly.
+To shield from network files that violates this restriction, Caffe2DML performs rewrites in CaffeNetwork class (search for condition 1-5 in Caffe2DML class).
+
+6. Like Caffe, Caffe2DML also expects the layers to be in sorted order.
diff --git a/beginners-guide-keras2dml.md b/reference-guide-keras2dml.md
similarity index 67%
copy from beginners-guide-keras2dml.md
copy to reference-guide-keras2dml.md
index 2259397..a576ee7 100644
--- a/beginners-guide-keras2dml.md
+++ b/reference-guide-keras2dml.md
@@ -27,93 +27,10 @@ limitations under the License.
 
 <br/>
 
-## Introduction
 
-Keras2DML is an **experimental API** that converts a Keras specification to DML through the intermediate Caffe2DML module. 
-It is designed to fit well into the mllearn framework and hence supports NumPy, Pandas as well as PySpark DataFrame.
+# Layers supported in Keras2DML
 
-### Getting Started 
-
-To create a Keras2DML object, one needs to create a Keras model through the Funcitonal API. please see the [Functional API.](https://keras.io/models/model/)
-This module utilizes the existing [Caffe2DML](beginners-guide-caffe2dml) backend to convert Keras models into DML. Keras models are 
-parsed and translated into Caffe prototext and caffemodel files which are then piped into Caffe2DML. Thus one can follow the Caffe2DML
-documentation for further information.
-
-### Model Conversion
-
-Keras models are parsed based on their layer structure and corresponding weights and translated into the relative Caffe layer and weight
-configuration. Be aware that currently this is a translation into Caffe and there will be loss of information from keras models such as 
-intializer information, and other layers which do not exist in Caffe. 
-
-First, install SystemML and other dependencies for the below demo:
-
-```
-pip install systemml keras tensorflow mlxtend
-``` 
-
-To create a Keras2DML object, simply pass the keras object to the Keras2DML constructor. It's also important to note that your models
-should be compiled so that the loss can be accessed for Caffe2DML.
-
-
-
-```python
-# pyspark --driver-memory 20g
-
-# Disable Tensorflow from using GPU to avoid unnecessary evictions by SystemML runtime
-import os
-os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
-os.environ['CUDA_VISIBLE_DEVICES'] = ''
-
-# Import dependencies
-from mlxtend.data import mnist_data
-import numpy as np
-from sklearn.utils import shuffle
-from keras.models import Sequential
-from keras.layers import Input, Dense, Conv2D, MaxPooling2D, Dropout,Flatten
-from keras import backend as K
-from keras.models import Model
-from keras.optimizers import SGD
-
-# Set channel first layer
-K.set_image_data_format('channels_first')
-
-# Download the MNIST dataset
-X, y = mnist_data()
-X, y = shuffle(X, y)
-
-# Split the data into training and test
-n_samples = len(X)
-X_train = X[:int(.9 * n_samples)]
-y_train = y[:int(.9 * n_samples)]
-X_test = X[int(.9 * n_samples):]
-y_test = y[int(.9 * n_samples):]
-
-# Define Lenet in Keras
-keras_model = Sequential()
-keras_model.add(Conv2D(32, kernel_size=(5, 5), activation='relu', input_shape=(1,28,28), padding='same'))
-keras_model.add(MaxPooling2D(pool_size=(2, 2)))
-keras_model.add(Conv2D(64, (5, 5), activation='relu', padding='same'))
-keras_model.add(MaxPooling2D(pool_size=(2, 2)))
-keras_model.add(Flatten())
-keras_model.add(Dense(512, activation='relu'))
-keras_model.add(Dropout(0.5))
-keras_model.add(Dense(10, activation='softmax'))
-keras_model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True))
-keras_model.summary()
-
-# Scale the input features
-scale = 0.00390625
-X_train = X_train*scale
-X_test = X_test*scale
-
-# Train Lenet using SystemML
-from systemml.mllearn import Keras2DML
-sysml_model = Keras2DML(spark, keras_model, weights='weights_dir')
-# sysml_model.setConfigProperty("sysml.native.blas", "auto")
-# sysml_model.setGPU(True).setForceGPU(True)
-sysml_model.fit(X_train, y_train)
-sysml_model.score(X_test, y_test)
-```
+TODO:
 
 # Frequently asked questions
 
@@ -132,7 +49,6 @@ print(sysml_model.get_training_script())
 |                                                        | Specified via the given parameter in the Keras2DML constructor | From input Keras' model                                                                 | Corresponding parameter in the Caffe solver file |
 |--------------------------------------------------------|----------------------------------------------------------------|-----------------------------------------------------------------------------------------|--------------------------------------------------|
 | Solver type                                            |                                                                | `type(keras_model.optimizer)`. Supported types: `keras.optimizers.{SGD, Adagrad, Adam}` | `type`                                           |
-| Maximum number of iterations                           | `max_iter`                                                     | The `epoch` parameter in the `fit` method is not supported.                             | `max_iter`                                       |
 | Validation dataset                                     | `test_iter` (explained in the below section)                   | The `validation_data` parameter in the `fit` method is not supported.                   | `test_iter`                                      |
 | Monitoring the loss                                    | `display, test_interval` (explained in the below section)      | The `LossHistory` callback in the `fit` method is not supported.                        | `display, test_interval`                         |
 | Learning rate schedule                                 | `lr_policy`                                                    | The `LearningRateScheduler` callback in the `fit` method is not supported.              | `lr_policy` (default: step)                      |
@@ -143,21 +59,11 @@ print(sysml_model.get_training_script())
 | If type of the optimizer is `keras.optimizers.Adam`    |                                                                | `beta_1, beta_2, epsilon`. The parameter `amsgrad` is not supported.                    | `momentum, momentum2, delta`                     |
 | If type of the optimizer is `keras.optimizers.Adagrad` |                                                                | `epsilon`                                                                               | `delta`                                          |
 
-#### How do I specify the batch size and the number of epochs ?
+#### How do I specify the batch size and the number of epochs?
 
-Since Keras2DML is a mllearn API, it doesnot accept the batch size and number of epochs as the parameter in the `fit` method.
-Instead, these parameters are passed via `batch_size` and `max_iter` parameters in the Keras2DML constructor.
-For example, the equivalent Python code for `keras_model.fit(features, labels, epochs=10, batch_size=64)` is as follows:
+Like Keras, the user can provide `batch_size` and `epochs` via the `fit` method. For example: `sysml_model.fit(features, labels, epochs=10, batch_size=64)`.
 
-```python
-from systemml.mllearn import Keras2DML
-epochs = 10
-batch_size = 64
-num_samples = features.shape[0]
-max_iter = int(epochs*math.ceil(num_samples/batch_size))
-sysml_model = Keras2DML(spark, keras_model, batch_size=batch_size, max_iter=max_iter, ...)
-sysml_model.fit(features, labels)
-``` 
+Note, we do not support `verbose` and `callbacks` parameters in our `fit` method. Please use SparkContext's `setLogLevel` method to control the verbosity.
 
 #### What optimizer and loss does Keras2DML use by default if `keras_model` is not compiled ?
 
@@ -184,24 +90,9 @@ Instead one can set the custom learning rate schedule to one of the following sc
 
 #### How to set the size of the validation dataset ?
 
-The size of the validation dataset is determined by the parameters `test_iter` and the batch size. For example: If the batch size is 64 and 
-`test_iter` is set to 10 in the `Keras2DML`'s constructor, then the validation size is 640. This setting generates following DML code internally:
-
-```python
-num_images = nrow(y_full)
-BATCH_SIZE = 64
-num_validation = 10 * BATCH_SIZE
-X = X_full[(num_validation+1):num_images,]; y = y_full[(num_validation+1):num_images,]
-X_val = X_full[1:num_validation,]; y_val = y_full[1:num_validation,]
-num_images = nrow(y)
-``` 
-
-#### How to monitor loss via command-line ?
-
-To monitor loss, please set the parameters `display`, `test_iter` and `test_interval` in the `Keras2DML`'s constructor.  
-For example: for the expression `Keras2DML(..., display=100, test_iter=10, test_interval=500)`, we
-- display the training loss and accuracy every 100 iterations and
-- carry out validation every 500 training iterations and display validation loss and accuracy.
+Like Keras, the validation dataset can be set in two ways:
+1. `validation_split` parameter (of type `float` between 0 and 1) in the `fit` method: It is the fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch.
+2. `validation_data` parameter (of type `(x_val, y_val)` where `x_val` and `y_val` are NumPy arrays) in the `fit` method: on which to evaluate the loss at the end of each epoch. The model will not be trained on this data.  validation_data will override validation_split.
 
 #### How do you ensure that Keras2DML produce same results as other Keras' backend?
 
@@ -232,13 +123,26 @@ Here are high-level guidelines to train very deep models on GPU with Keras2DML (
 - Use either of the above two options.
 - Or enable `train_algo` that performs multiple forward-backward pass with batch size `local_batch_size`, aggregate gradients and finally updates the model: 
 ```python
-sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
+sysml_model = Keras2DML(spark, keras_model)
 sysml_model.set(train_algo="looped_minibatch", parallel_batches=int(batch_size/local_batch_size))
 sysml_model.setGPU(True).setForceGPU(True)
+sysml_model.fit(X, y, batch_size=local_batch_size)
 ```
 - Or add `int(batch_size/local_batch_size)` GPUs and perform single-node multi-GPU training with batch size `local_batch_size`:
 ```python
-sysml_model = Keras2DML(spark, keras_model, batch_size=local_batch_size)
+sysml_model = Keras2DML(spark, keras_model)
 sysml_model.set(train_algo="allreduce_parallel_batches", parallel_batches=int(batch_size/local_batch_size))
 sysml_model.setGPU(True).setForceGPU(True)
+sysml_model.fit(X, y, batch_size=local_batch_size)
 ```
+
+#### Design document of Keras2DML.
+
+Keras2DML internally utilizes the existing [Caffe2DML](beginners-guide-caffe2dml) backend to convert Keras models into DML. Keras models are 
+parsed and translated into Caffe prototext and caffemodel files which are then piped into Caffe2DML. 
+
+Keras models are parsed based on their layer structure and corresponding weights and translated into the relative Caffe layer and weight
+configuration. Be aware that currently this is a translation into Caffe and there will be loss of information from keras models such as 
+intializer information, and other layers which do not exist in Caffe. 
+
+Read the [Caffe2DML's reference guide](http://apache.github.io/systemml/reference-guide-caffe2dml) for the design documentation. 
\ No newline at end of file