You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/06/15 00:37:57 UTC

[GitHub] anirudh2290 closed pull request #11267: Add NEWS and README

anirudh2290 closed pull request #11267: Add NEWS and README
URL: https://github.com/apache/incubator-mxnet/pull/11267
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CMakeLists.txt b/CMakeLists.txt
index ed96a6c8371..34e216b3179 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -353,7 +353,11 @@ if(USE_OPENMP)
   find_package(OpenMP REQUIRED)
   # This should build on Windows, but there's some problem and I don't have a Windows box, so
   # could a Windows user please fix?
-  if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt AND SYSTEM_ARCHITECTURE STREQUAL "x86_64" AND NOT MSVC)
+  if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt
+     AND SYSTEM_ARCHITECTURE STREQUAL "x86_64"
+     AND NOT MSVC
+     AND NOT CMAKE_CROSSCOMPILING)
+
     # Intel/llvm OpenMP: https://github.com/llvm-mirror/openmp
     set(OPENMP_STANDALONE_BUILD TRUE)
     set(LIBOMP_ENABLE_SHARED TRUE)
diff --git a/NEWS.md b/NEWS.md
index be9e459c505..1a9c12b0cb1 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,5 +1,20 @@
 MXNet Change Log
 ================
+## 1.2.1
+### Deprecations
+- An incorrect [usage](https://github.com/apache/incubator-mxnet/issues/11091) of `save_params` was advertised in the gluon book which led to MXNet users depending on the incorrect usage and developing a hack around it. A change was made to the internal structure of the `.params` file saved by `save_params` to resolve a bug. This led to user scripts with the above mentioned hack to break. To fix this, `save_params` and `load_params` APIs have been reverted to previous format and marked as deprecated. New APIs: `save_parameters` and `load_parameters` have been added for the new format. All scripts to save and load parameters for a Gluon model should now use the new API for `save_parameters` and `load_parameters`. If your model is hybridizable and you want to export a serialized structure of the model as well as parameters you need to use the `export` API and the newly added `imports` API instead of `save_params` and `load_params` API. For more details, Please see: [issue](https://github.com/apache/incubator-mxnet/issues/11091), [PR](https://github.com/apache/incubator-mxnet/pull/11127).
+
+### Bug Fixes
+- Fixed MKLDNN bugs (#10613, #10021, #10616, #10764, #10591, #10731, #10918, #10706, #10651, #10979).
+- Fixed Scala Inference Memory leak (#11216).
+- Fixed Cross Compilation for armv7 (#11054).
+
+### Performance Improvements
+- Reduced memory consumption from inplace operation for ReLU activation (#10847).
+- Improved `slice` operator performance by 20x (#11124).
+- Improved performance of depthwise convolution by using cudnnv7 if available (#11076).
+- Improved performance and memory usage of Conv1D, by adding back cuDNN support for Conv1D (#11270). This adds a known issue: The cuDNN convolution operator may throw `CUDNN_STATUS_EXECUTION_FAILED` when `req == "add"` and `cudnn_tune != off` with large inputs(e.g. 64k channels). If you encounter this issue, please consider setting `MXNET_CUDNN_AUTOTUNE_DEFAULT` to 0.
+
 ## 1.2.0
 ### New Features - Added Scala Inference APIs
 - Implemented new [Scala Inference APIs](https://cwiki.apache.org/confluence/display/MXNET/MXNetScalaInferenceAPI) which offer an easy-to-use, Scala Idiomatic and thread-safe high level APIs for performing predictions with deep learning models trained with MXNet (#9678). Implemented a new ImageClassifier class which provides APIs for classification tasks on a Java BufferedImage using a pre-trained model you provide (#10054). Implemented a new ObjectDetector class which provides APIs for object and boundary detections on a Java BufferedImage using a pre-trained model you provide (#10229).
diff --git a/README.md b/README.md
index c37959d6d74..ea529e49ffd 100644
--- a/README.md
+++ b/README.md
@@ -22,6 +22,7 @@ deep learning systems, and interesting insights of DL systems for hackers.
 
 What's New
 ----------
+* [Version 1.2.1 Release](https://github.com/apache/incubator-mxnet/releases/tag/1.2.1) - MXNet 1.2.1 Release.
 * [Version 1.2.0 Release](https://github.com/apache/incubator-mxnet/releases/tag/1.2.0) - MXNet 1.2.0 Release.
 * [Version 1.1.0 Release](https://github.com/apache/incubator-mxnet/releases/tag/1.1.0) - MXNet 1.1.0 Release.
 * [Version 1.0.0 Release](https://github.com/apache/incubator-mxnet/releases/tag/1.0.0) - MXNet 1.0.0 Release.
diff --git a/ci/docker/Dockerfile.build.arm64 b/ci/docker/Dockerfile.build.arm64
index ec949600f73..7a2e1723360 100755
--- a/ci/docker/Dockerfile.build.arm64
+++ b/ci/docker/Dockerfile.build.arm64
@@ -27,13 +27,16 @@ ENV FC /usr/bin/${CROSS_TRIPLE}-gfortran
 ENV HOSTCC gcc
 ENV TARGET ARMV8
 
-WORKDIR /work
+WORKDIR /work/deps
 
-# Build OpenBLAS
-RUN git clone --recursive -b v0.2.20 https://github.com/xianyi/OpenBLAS.git && \
-    cd OpenBLAS && \
-    make -j$(nproc) && \
-    PREFIX=${CROSS_ROOT} make install
+COPY install/ubuntu_arm.sh /work/
+RUN /work/ubuntu_arm.sh
+
+COPY install/arm_openblas.sh /work/
+RUN /work/arm_openblas.sh
+
+ENV OpenBLAS_HOME=${CROSS_ROOT}
+ENV OpenBLAS_DIR=${CROSS_ROOT}
 
 COPY runtime_functions.sh /work/
 WORKDIR /work/mxnet
diff --git a/ci/docker/Dockerfile.build.armv6 b/ci/docker/Dockerfile.build.armv6
index 20739dabe2e..f9ec0f56092 100755
--- a/ci/docker/Dockerfile.build.armv6
+++ b/ci/docker/Dockerfile.build.armv6
@@ -27,11 +27,14 @@ ENV TARGET ARMV6
 
 WORKDIR /work/deps
 
-# Build OpenBLAS
-RUN git clone --recursive -b v0.2.20 https://github.com/xianyi/OpenBLAS.git && \
-    cd OpenBLAS && \
-    make -j$(nproc) && \
-    make PREFIX=$CROSS_ROOT install
+COPY install/ubuntu_arm.sh /work/
+RUN /work/ubuntu_arm.sh
+
+COPY install/arm_openblas.sh /work/
+RUN /work/arm_openblas.sh
+
+ENV OpenBLAS_HOME=${CROSS_ROOT}
+ENV OpenBLAS_DIR=${CROSS_ROOT}
 
 COPY runtime_functions.sh /work/
 WORKDIR /work/mxnet
diff --git a/ci/docker/Dockerfile.build.armv7 b/ci/docker/Dockerfile.build.armv7
index c2493063518..04002dcb292 100755
--- a/ci/docker/Dockerfile.build.armv7
+++ b/ci/docker/Dockerfile.build.armv7
@@ -16,17 +16,25 @@
 # specific language governing permissions and limitations
 # under the License.
 #
-# Dockerfile to build MXNet for Android ARMv7
+# Dockerfile to build MXNet for ARMv7 (Android & RPi)
 
 FROM dockcross/linux-armv7
 
-ENV ARCH armv71
-ENV CC /usr/bin/arm-linux-gnueabihf-gcc
-ENV CXX /usr/bin/arm-linux-gnueabihf-g++
+ENV ARCH armv7l
+ENV HOSTCC gcc
+ENV TARGET ARMV7
+ENV FC /usr/bin/${CROSS_TRIPLE}-gfortran
 
-RUN apt-get update && \
-    apt-get install -y libopenblas-dev:armhf && \
-    rm -rf /var/lib/apt/lists/*
+WORKDIR /work/deps
+
+COPY install/ubuntu_arm.sh /work/
+RUN /work/ubuntu_arm.sh
+
+COPY install/arm_openblas.sh /work/
+RUN /work/arm_openblas.sh
+
+ENV OpenBLAS_HOME=${CROSS_ROOT}
+ENV OpenBLAS_DIR=${CROSS_ROOT}
 
 COPY runtime_functions.sh /work/
-WORKDIR /work/build
+WORKDIR /work/mxnet
diff --git a/ci/docker/Dockerfile.build.jetson b/ci/docker/Dockerfile.build.jetson
index c358edb1fb0..5bbc5d4f4be 100755
--- a/ci/docker/Dockerfile.build.jetson
+++ b/ci/docker/Dockerfile.build.jetson
@@ -31,13 +31,16 @@ ENV FC /usr/bin/${CROSS_TRIPLE}-gfortran
 ENV HOSTCC gcc
 ENV TARGET ARMV8
 
-WORKDIR /work
+WORKDIR /work/deps
 
-# Build OpenBLAS
-RUN git clone --recursive -b v0.2.20 https://github.com/xianyi/OpenBLAS.git && \
-    cd OpenBLAS && \
-    make -j$(nproc) && \
-    PREFIX=${CROSS_ROOT} make install
+COPY install/ubuntu_arm.sh /work/
+RUN /work/ubuntu_arm.sh
+
+COPY install/arm_openblas.sh /work/
+RUN /work/arm_openblas.sh
+
+ENV OpenBLAS_HOME=${CROSS_ROOT}
+ENV OpenBLAS_DIR=${CROSS_ROOT}
 
 # Setup CUDA build env (including configuring and copying nvcc)
 COPY --from=cudabuilder /usr/local/cuda /usr/local/cuda
diff --git a/ci/docker/install/arm_openblas.sh b/ci/docker/install/arm_openblas.sh
new file mode 100755
index 00000000000..fa2e5cae9cb
--- /dev/null
+++ b/ci/docker/install/arm_openblas.sh
@@ -0,0 +1,30 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -ex
+
+git clone --recursive -b v0.2.20 https://github.com/xianyi/OpenBLAS.git
+
+cd OpenBLAS
+make -j$(nproc)
+PREFIX=${CROSS_ROOT} make install
+
+cd ..
+
+rm -rf OpenBLAS
diff --git a/ci/docker/install/ubuntu_arm.sh b/ci/docker/install/ubuntu_arm.sh
new file mode 100755
index 00000000000..becb012bd18
--- /dev/null
+++ b/ci/docker/install/ubuntu_arm.sh
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -ex
+
+apt update
+apt install -y \
+    unzip
diff --git a/ci/docker/runtime_functions.sh b/ci/docker/runtime_functions.sh
index 8ba6fa36a67..021a60c67be 100755
--- a/ci/docker/runtime_functions.sh
+++ b/ci/docker/runtime_functions.sh
@@ -31,31 +31,50 @@ clean_repo() {
     git submodule update --init --recursive
 }
 
+build_wheel() {
 
-# Build commands: Every platform in docker/Dockerfile.build.<platform> should have a corresponding
-# function here with the same suffix:
-
-build_jetson() {
     set -ex
     pushd .
-    mv make/crosscompile.jetson.mk make/config.mk
-    make -j$(nproc)
 
-    export MXNET_LIBRARY_PATH=`pwd`/libmxnet.so
-    cd /work/mxnet/python
+    PYTHON_DIR=${1:-/work/mxnet/python}
+    BUILD_DIR=${2:-/work/build}
+
+    # build
+
+    export MXNET_LIBRARY_PATH=${BUILD_DIR}/libmxnet.so
+
+    cd ${PYTHON_DIR}
     python setup.py bdist_wheel --universal
 
+    # repackage
+
     # Fix pathing issues in the wheel.  We need to move libmxnet.so from the data folder to the
     # mxnet folder, then repackage the wheel.
     WHEEL=`readlink -f dist/*.whl`
     TMPDIR=`mktemp -d`
-    unzip -d $TMPDIR $WHEEL
-    rm $WHEEL
-    cd $TMPDIR
+    unzip -d ${TMPDIR} ${WHEEL}
+    rm ${WHEEL}
+    cd ${TMPDIR}
     mv *.data/data/mxnet/libmxnet.so mxnet
-    zip -r $WHEEL .
-    cp $WHEEL /work/build
-    rm -rf $TMPDIR
+    zip -r ${WHEEL} .
+    cp ${WHEEL} ${BUILD_DIR}
+    rm -rf ${TMPDIR}
+
+    popd
+}
+
+# Build commands: Every platform in docker/Dockerfile.build.<platform> should have a corresponding
+# function here with the same suffix:
+
+build_jetson() {
+    set -ex
+    pushd .
+
+    cp make/crosscompile.jetson.mk ./config.mk
+    make -j$(nproc)
+
+    build_wheel /work/mxnet/python /work/mxnet/lib
+
     popd
 }
 
@@ -72,7 +91,8 @@ build_armv6() {
     # We do not need OpenMP, since most armv6 systems have only 1 core
 
     cmake \
-        -DCMAKE_TOOLCHAIN_FILE=$CROSS_ROOT/Toolchain.cmake \
+        -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE} \
+	-DCMAKE_CROSSCOMPILING=ON \
         -DUSE_CUDA=OFF \
         -DUSE_OPENCV=OFF \
         -DUSE_OPENMP=OFF \
@@ -83,31 +103,41 @@ build_armv6() {
         -DBUILD_CPP_EXAMPLES=OFF \
         -Dmxnet_LINKER_LIBS=-lgfortran \
         -G Ninja /work/mxnet
+
     ninja
-    export MXNET_LIBRARY_PATH=`pwd`/libmxnet.so
-    cd /work/mxnet/python
-    python setup.py bdist_wheel --universal
-    cp dist/*.whl /work/build
+    build_wheel
+
     popd
 }
 
 build_armv7() {
     set -ex
     pushd .
+    
     cd /work/build
-    cmake\
-        -DUSE_CUDA=OFF\
-        -DUSE_OPENCV=OFF\
-        -DUSE_OPENMP=OFF\
-        -DUSE_SIGNAL_HANDLER=ON\
-        -DCMAKE_BUILD_TYPE=RelWithDebInfo\
-        -DUSE_MKL_IF_AVAILABLE=OFF\
+
+    # Lapack functionality will be included and statically linked to openblas.
+    # But USE_LAPACK needs to be set to OFF, otherwise the main CMakeLists.txt
+    # file tries to add -llapack. Lapack functionality though, requires -lgfortran
+    # to be linked additionally.
+
+    cmake \
+        -DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE} \
+        -DCMAKE_CROSSCOMPILING=ON \
+        -DUSE_CUDA=OFF \
+        -DUSE_OPENCV=OFF \
+        -DUSE_OPENMP=ON \
+        -DUSE_SIGNAL_HANDLER=ON \
+        -DCMAKE_BUILD_TYPE=Release \
+        -DUSE_MKL_IF_AVAILABLE=OFF \
+        -DUSE_LAPACK=OFF \
+        -DBUILD_CPP_EXAMPLES=OFF \
+        -Dmxnet_LINKER_LIBS=-lgfortran \
         -G Ninja /work/mxnet
+
     ninja
-    export MXNET_LIBRARY_PATH=`pwd`/libmxnet.so
-    cd /work/mxnet/python
-    python setup.py bdist_wheel --universal
-    cp dist/*.whl /work/build
+    build_wheel
+
     popd
 }
 
diff --git a/cmake/Modules/FindOpenBLAS.cmake b/cmake/Modules/FindOpenBLAS.cmake
index a3a79caae46..4c0289b866a 100644
--- a/cmake/Modules/FindOpenBLAS.cmake
+++ b/cmake/Modules/FindOpenBLAS.cmake
@@ -52,8 +52,13 @@ SET(Open_BLAS_LIB_SEARCH_PATHS
         ${OpenBLAS_HOME}/lib
  )
 
+# link statically for cross compilations
+if(CMAKE_CROSSCOMPILING)
+  set(OpenBLAS_LIB_NAMES libopenblas.a)
+endif()
+  
 FIND_PATH(OpenBLAS_INCLUDE_DIR NAMES cblas.h PATHS ${Open_BLAS_INCLUDE_SEARCH_PATHS})
-FIND_LIBRARY(OpenBLAS_LIB NAMES openblas PATHS ${Open_BLAS_LIB_SEARCH_PATHS})
+FIND_LIBRARY(OpenBLAS_LIB NAMES ${OpenBLAS_LIB_NAMES} openblas PATHS ${Open_BLAS_LIB_SEARCH_PATHS})
 IF(NOT OpenBLAS_LIB)
 	FIND_FILE(OpenBLAS_LIB NAMES libopenblas.dll.a PATHS ${Open_BLAS_LIB_SEARCH_PATHS})
 ENDIF()
diff --git a/docs/tutorials/gluon/hybrid.md b/docs/tutorials/gluon/hybrid.md
index 3554a15fa3b..fe8ca6fbf48 100644
--- a/docs/tutorials/gluon/hybrid.md
+++ b/docs/tutorials/gluon/hybrid.md
@@ -87,7 +87,7 @@ net(x)
 Hybrid execution can be activated by simply calling `.hybridize()` on the top
 level layer. The first forward call after activation will try to build a
 computation graph from `hybrid_forward` and cache it. On subsequent forward
-calls the cached graph instead of `hybrid_forward` will be invoked:
+calls the cached graph, instead of `hybrid_forward`, will be invoked:
 
 ```python
 net.hybridize()
@@ -105,23 +105,26 @@ Hybridize will speed up execution and save memory. If the top level layer is
 not a `HybridBlock`, you can still call `.hybridize()` on it and Gluon will try
 to hybridize its children layers instead.
 
+Please refer to the [API manual](https://mxnet.incubator.apache.org/api/python/gluon/gluon.html?highlight=hybridize#mxnet.gluon.Block.hybridize)
+for details.
+
 ## Serializing trained model for deployment
 
-Models implemented as `HybridBlock` can be easily serialized for deployment
-using other language front-ends like C, C++ and Scala. To this end, we simply
-forward the model with symbolic variables instead of NDArrays and save the
-output Symbol(s):
+Models implemented as `HybridBlock` can be easily serialized. The serialized
+model can be loaded back later or used for deployment
+with other language front-ends like C, C++ and Scala. To this end, we simply
+use `export` and `SymbolBlock.imports`:
 
 ```python
-x = mx.sym.var('data')
-y = net(x)
-print(y)
-y.save('model.json')
-net.save_params('model.params')
+net.export('model', epoch=1)
 ```
 
-If your network outputs more than one value, you can use `mx.sym.Group` to
-combine them into a grouped Symbol and then save. The saved json and params
-files can then be loaded with C, C++ and Scala interface for prediction.
+Two files `model-symbol.json` and `model-0001.params` are saved on disk.
+You can use other language bindings to load them. You can also load them back
+to gluon with `SymbolBlock`:
+
+```python
+net2 = gluon.SymbolBlock.imports('model-symbol.json', ['data'], 'model-0001.params')
+```
 
 <!-- INSERT SOURCE DOWNLOAD BUTTONS -->
diff --git a/docs/tutorials/gluon/naming.md b/docs/tutorials/gluon/naming.md
index 37b63fa08a9..3606a03dcbd 100644
--- a/docs/tutorials/gluon/naming.md
+++ b/docs/tutorials/gluon/naming.md
@@ -203,12 +203,12 @@ except Exception as e:
     Parameter 'model1_dense0_weight' is missing in file 'model.params', which contains parameters: 'model0_mydense_weight', 'model0_dense1_bias', 'model0_dense1_weight', 'model0_dense0_weight', 'model0_dense0_bias', 'model0_mydense_bias'. Please make sure source and target networks have the same prefix.
 
 
-To solve this problem, we use `save_params`/`load_params` instead of `collect_params` and `save`/`load`. `save_params` uses model structure, instead of parameter name, to match parameters.
+To solve this problem, we use `save_parameters`/`load_parameters` instead of `collect_params` and `save`/`load`. `save_parameters` uses model structure, instead of parameter name, to match parameters.
 
 
 ```python
-model0.save_params('model.params')
-model1.load_params('model.params')
+model0.save_parameters('model.params')
+model1.load_parameters('model.params')
 print(mx.nd.load('model.params').keys())
 ```
 
diff --git a/docs/tutorials/gluon/save_load_params.md b/docs/tutorials/gluon/save_load_params.md
new file mode 100644
index 00000000000..f5f48125cc1
--- /dev/null
+++ b/docs/tutorials/gluon/save_load_params.md
@@ -0,0 +1,261 @@
+# Saving and Loading Gluon Models
+
+Training large models take a lot of time and it is a good idea to save the trained models to files to avoid training them again and again. There are a number of reasons to do this. For example, you might want to do inference on a machine that is different from the one where the model was trained. Sometimes model's performance on validation set decreases towards the end of the training because of overfitting. If you saved your model parameters after every epoch, at the end you can decide to use the model that performs best on the validation set. Another reason would be to train your model using one language (like Python that has a lot of tools for training) and run inference using a different language (like Scala probably because your application is built on Scala).
+
+In this tutorial, we will learn ways to save and load Gluon models. There are two ways to save/load Gluon models:
+
+**1. Save/load model parameters only**
+
+Parameters of any Gluon model can be saved using the `save_params` and `load_params` method. This does not save model architecture. This method is used to save parameters of dynamic (non-hybrid) models. Model architecture cannot be saved for dynamic models because model architecture changes during execution.
+
+**2. Save/load model parameters AND architecture**
+
+The Model architecture of `Hybrid` models stays static and don't change during execution. Therefore both model parameters AND architecture can be saved and loaded using `export`, `imports` methods.
+
+Let's look at the above methods in more detail. Let's start by importing the modules we'll need.
+
+```python
+from __future__ import print_function
+
+import mxnet as mx
+import mxnet.ndarray as nd
+from mxnet import nd, autograd, gluon
+from mxnet.gluon.data.vision import transforms
+
+import numpy as np
+```
+
+## Setup: build and train a simple model
+
+We need a trained model before we can save it to a file. So let's go ahead and build a very simple convolutional network and train it on MNIST data.
+
+Let's define a helper function to build a LeNet model and another helper to train LeNet with MNIST.
+
+```python
+# Use GPU if one exists, else use CPU
+ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()
+
+# MNIST images are 28x28. Total pixels in input layer is 28x28 = 784
+num_inputs = 784
+# Clasify the images into one of the 10 digits
+num_outputs = 10
+# 64 images in a batch
+batch_size = 64
+
+# Load the training data
+train_data = gluon.data.DataLoader(gluon.data.vision.MNIST(train=True).transform_first(transforms.ToTensor()),
+                                   batch_size, shuffle=True)
+
+# Build a simple convolutional network
+def build_lenet(net):    
+    with net.name_scope():
+        # First convolution
+        net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
+        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
+        # Second convolution
+        net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
+        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
+        # Flatten the output before the fully connected layers
+        net.add(gluon.nn.Flatten())
+        # First fully connected layers with 512 neurons
+        net.add(gluon.nn.Dense(512, activation="relu"))
+        # Second fully connected layer with as many neurons as the number of classes
+        net.add(gluon.nn.Dense(num_outputs))
+
+        return net
+
+# Train a given model using MNIST data
+def train_model(model):
+    # Initialize the parameters with Xavier initializer
+    model.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
+    # Use cross entropy loss
+    softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
+    # Use Adam optimizer
+    trainer = gluon.Trainer(model.collect_params(), 'adam', {'learning_rate': .001})
+
+    # Train for one epoch
+    for epoch in range(1):
+        # Iterate through the images and labels in the training data
+        for batch_num, (data, label) in enumerate(train_data):
+            # get the images and labels
+            data = data.as_in_context(ctx)
+            label = label.as_in_context(ctx)
+            # Ask autograd to record the forward pass
+            with autograd.record():
+                # Run the forward pass
+                output = model(data)
+                # Compute the loss
+                loss = softmax_cross_entropy(output, label)
+            # Compute gradients
+            loss.backward()
+            # Update parameters
+            trainer.step(data.shape[0])
+
+            # Print loss once in a while
+            if batch_num % 50 == 0:
+                curr_loss = nd.mean(loss).asscalar()
+                print("Epoch: %d; Batch %d; Loss %f" % (epoch, batch_num, curr_loss))
+```
+
+Let's build a model and train it. After training, we will save and restore this model from a file.
+
+```python
+net = build_lenet(gluon.nn.Sequential())
+train_model(net)
+```
+<pre>Epoch: 0; Batch 0; Loss 2.288904 <!--notebook-skip-line-->
+Epoch: 0; Batch 50; Loss 0.269372 <!--notebook-skip-line-->
+Epoch: 0; Batch 100; Loss 0.238990 <!--notebook-skip-line-->
+Epoch: 0; Batch 150; Loss 0.320592 <!--notebook-skip-line-->
+Epoch: 0; Batch 200; Loss 0.048619 <!--notebook-skip-line-->
+Epoch: 0; Batch 250; Loss 0.121555 <!--notebook-skip-line-->
+Epoch: 0; Batch 300; Loss 0.083645 <!--notebook-skip-line-->
+Epoch: 0; Batch 350; Loss 0.040627 <!--notebook-skip-line-->
+Epoch: 0; Batch 400; Loss 0.195946 <!--notebook-skip-line-->
+Epoch: 0; Batch 450; Loss 0.155514 <!--notebook-skip-line-->
+Epoch: 0; Batch 500; Loss 0.031762 <!--notebook-skip-line-->
+Epoch: 0; Batch 550; Loss 0.056516 <!--notebook-skip-line-->
+Epoch: 0; Batch 600; Loss 0.095174 <!--notebook-skip-line-->
+Epoch: 0; Batch 650; Loss 0.054901 <!--notebook-skip-line-->
+Epoch: 0; Batch 700; Loss 0.030067 <!--notebook-skip-line-->
+Epoch: 0; Batch 750; Loss 0.102611 <!--notebook-skip-line-->
+Epoch: 0; Batch 800; Loss 0.010036 <!--notebook-skip-line-->
+Epoch: 0; Batch 850; Loss 0.051853 <!--notebook-skip-line-->
+Epoch: 0; Batch 900; Loss 0.008402 <!--notebook-skip-line-->
+</pre> <!--notebook-skip-line-->
+
+## Saving model parameters to file
+
+Okay, we now have a model (`net`) that we can save to a file. Let's save the parameters of this model to a file using the `save_params` function.
+
+```python
+file_name = "net.params"
+net.save_params(file_name)
+```
+
+We have successfully saved the parameters of the model into a file.
+
+Note: `Block.collect_params().save()` is not a recommended way to save parameters of a Gluon network if you plan to load the parameters back into a Gluon network using `Block.load_params()`.
+
+## Loading model parameters from file
+
+Let's now create a network with the parameters we saved into the file. We build the network again using the helper first and then load the weights from the file we saved using the `load_params` function.
+
+```python
+new_net = build_lenet(gluon.nn.Sequential())
+new_net.load_params(file_name, ctx=ctx)
+```
+
+Note that to do this, we need the definition of the network as Python code. If we want to recreate this network on a different machine using the saved weights, we need the same Python code (`build_lenet`) that created the network to create the `new_net` object shown above. This means Python code needs to be copied over to any machine where we want to run this network.
+
+If our network is [Hybrid](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html), we can even save the network architecture into files and we won't need the network definition in a Python file to load the network. We'll see how to do it in the next section.
+
+Let's test the model we just loaded from file.
+
+```python
+import matplotlib.pyplot as plt
+
+def verify_loaded_model(net):
+    """Run inference using ten random images.
+    Print both input and output of the model"""
+
+    def transform(data, label):
+        return data.astype(np.float32)/255, label.astype(np.float32)
+
+    # Load ten random images from the test dataset
+    sample_data = mx.gluon.data.DataLoader(mx.gluon.data.vision.MNIST(train=False, transform=transform),
+                                  10, shuffle=True)
+
+    for data, label in sample_data:
+
+        # Display the images
+        img = nd.transpose(data, (1,0,2,3))
+        img = nd.reshape(img, (28,10*28,1))
+        imtiles = nd.tile(img, (1,1,3))
+        plt.imshow(imtiles.asnumpy())
+        plt.show()
+
+        # Display the predictions
+        data = nd.transpose(data, (0, 3, 1, 2))
+        out = net(data.as_in_context(ctx))
+        predictions = nd.argmax(out, axis=1)
+        print('Model predictions: ', predictions.asnumpy())
+
+        break
+
+verify_loaded_model(new_net)
+```
+![Model inputs](https://raw.githubusercontent.com/indhub/web-data/4a9c100aa996df3dff0e7f493029d411c2b526c3/mxnet/tutorials/gluon/save_load_params/mnist_in_1.png) <!--notebook-skip-line-->
+
+Model predictions:  [1. 1. 4. 5. 0. 5. 7. 0. 3. 6.] <!--notebook-skip-line-->
+
+## Saving model parameters AND architecture to file
+
+[Hybrid](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html) models can be serialized as JSON files using the `export` function. Once serialized, these models can be loaded from other language bindings like C++ or Scala for faster inference or inference in different environments.
+
+Note that the network we created above is not a Hybrid network and therefore cannot be serialized into a JSON file. So, let's create a Hybrid version of the same network and train it.
+
+```python
+net = build_lenet(gluon.nn.HybridSequential())
+net.hybridize()
+train_model(net)
+```
+
+<pre>Epoch: 0; Batch 0; Loss 2.323284 <!--notebook-skip-line-->
+Epoch: 0; Batch 50; Loss 0.444733 <!--notebook-skip-line-->
+Epoch: 0; Batch 100; Loss 0.103407 <!--notebook-skip-line-->
+Epoch: 0; Batch 150; Loss 0.166772 <!--notebook-skip-line-->
+Epoch: 0; Batch 200; Loss 0.227569 <!--notebook-skip-line-->
+Epoch: 0; Batch 250; Loss 0.069515 <!--notebook-skip-line-->
+Epoch: 0; Batch 300; Loss 0.074086 <!--notebook-skip-line-->
+Epoch: 0; Batch 350; Loss 0.074382 <!--notebook-skip-line-->
+Epoch: 0; Batch 400; Loss 0.026569 <!--notebook-skip-line-->
+Epoch: 0; Batch 450; Loss 0.097248 <!--notebook-skip-line-->
+Epoch: 0; Batch 500; Loss 0.059895 <!--notebook-skip-line-->
+Epoch: 0; Batch 550; Loss 0.053194 <!--notebook-skip-line-->
+Epoch: 0; Batch 600; Loss 0.076294 <!--notebook-skip-line-->
+Epoch: 0; Batch 650; Loss 0.047274 <!--notebook-skip-line-->
+Epoch: 0; Batch 700; Loss 0.007898 <!--notebook-skip-line-->
+Epoch: 0; Batch 750; Loss 0.039478 <!--notebook-skip-line-->
+Epoch: 0; Batch 800; Loss 0.031342 <!--notebook-skip-line-->
+Epoch: 0; Batch 850; Loss 0.059289 <!--notebook-skip-line-->
+Epoch: 0; Batch 900; Loss 0.037809 <!--notebook-skip-line-->
+</pre> <!--notebook-skip-line-->
+
+We now have a trained hybrid network. This can be exported into files using the `export` function. The `export` function will export the model architecture into a `.json` file and model parameters into a `.params` file.
+
+```python
+net.export("lenet", epoch=1)
+```
+
+`export` in this case creates `lenet-symbol.json` and `lenet-0001.params` in the current directory.
+
+## Loading model parameters AND architecture from file
+
+### From a different frontend
+
+One of the main reasons to serialize model architecture into a JSON file is to load it from a different frontend like C, C++ or Scala. Here is a couple of examples:
+1. [Loading serialized Hybrid networks from C](https://github.com/apache/incubator-mxnet/blob/master/example/image-classification/predict-cpp/image-classification-predict.cc)
+2. [Loading serialized Hybrid networks from Scala](https://github.com/apache/incubator-mxnet/blob/master/scala-package/infer/src/main/scala/org/apache/mxnet/infer/ImageClassifier.scala)
+
+### From Python
+
+Serialized Hybrid networks (saved as .JSON and .params file) can be loaded and used inside Python frontend using `gluon.nn.SymbolBlock`. To demonstrate that, let's load the network we serialized above.
+
+```python
+deserialized_net = gluon.nn.SymbolBlock.imports("lenet-symbol.json", ['data'], "lenet-0001.params")
+```
+
+`deserialized_net` now contains the network we deserialized from files. Let's test the deserialized network to make sure it works.
+
+```python
+verify_loaded_model(deserialized_net)
+```
+
+![Model inputs](https://raw.githubusercontent.com/indhub/web-data/4a9c100aa996df3dff0e7f493029d411c2b526c3/mxnet/tutorials/gluon/save_load_params/mnist_in_2.png) <!--notebook-skip-line-->
+
+Model predictions:  [4. 8. 0. 1. 5. 5. 8. 8. 1. 9.] <!--notebook-skip-line-->
+
+That's all! We learned how to save and load Gluon networks from files. Parameters of any Gluon network can be persisted into files. For hybrid networks, both the architecture of the network and the parameters can be saved to and loaded from files.
+
+<!-- INSERT SOURCE DOWNLOAD BUTTONS -->
diff --git a/example/gluon/dcgan.py b/example/gluon/dcgan.py
index 3233f430eea..8ac9c522cf5 100644
--- a/example/gluon/dcgan.py
+++ b/example/gluon/dcgan.py
@@ -229,8 +229,8 @@ def transformer(data, label):
     logging.info('time: %f' % (time.time() - tic))
 
     if check_point:
-        netG.save_params(os.path.join(outf,'generator_epoch_%d.params' %epoch))
-        netD.save_params(os.path.join(outf,'discriminator_epoch_%d.params' % epoch))
+        netG.save_parameters(os.path.join(outf,'generator_epoch_%d.params' %epoch))
+        netD.save_parameters(os.path.join(outf,'discriminator_epoch_%d.params' % epoch))
 
-netG.save_params(os.path.join(outf, 'generator.params'))
-netD.save_params(os.path.join(outf, 'discriminator.params'))
+netG.save_parameters(os.path.join(outf, 'generator.params'))
+netD.save_parameters(os.path.join(outf, 'discriminator.params'))
diff --git a/example/gluon/embedding_learning/train.py b/example/gluon/embedding_learning/train.py
index 46f76b55614..b8a5bf2716c 100644
--- a/example/gluon/embedding_learning/train.py
+++ b/example/gluon/embedding_learning/train.py
@@ -246,7 +246,7 @@ def train(epochs, ctx):
         if val_accs[0] > best_val:
             best_val = val_accs[0]
             logging.info('Saving %s.' % opt.save_model_prefix)
-            net.save_params('%s.params' % opt.save_model_prefix)
+            net.save_parameters('%s.params' % opt.save_model_prefix)
     return best_val
 
 
diff --git a/example/gluon/image_classification.py b/example/gluon/image_classification.py
index a67a31790a0..2cf12f091eb 100644
--- a/example/gluon/image_classification.py
+++ b/example/gluon/image_classification.py
@@ -122,7 +122,7 @@ def get_model(model, ctx, opt):
 
     net = models.get_model(model, **kwargs)
     if opt.resume:
-        net.load_params(opt.resume)
+        net.load_parameters(opt.resume)
     elif not opt.use_pretrained:
         if model in ['alexnet']:
             net.initialize(mx.init.Normal())
@@ -176,12 +176,12 @@ def update_learning_rate(lr, trainer, epoch, ratio, steps):
 def save_checkpoint(epoch, top1, best_acc):
     if opt.save_frequency and (epoch + 1) % opt.save_frequency == 0:
         fname = os.path.join(opt.prefix, '%s_%d_acc_%.4f.params' % (opt.model, epoch, top1))
-        net.save_params(fname)
+        net.save_parameters(fname)
         logger.info('[Epoch %d] Saving checkpoint to %s with Accuracy: %.4f', epoch, fname, top1)
     if top1 > best_acc[0]:
         best_acc[0] = top1
         fname = os.path.join(opt.prefix, '%s_best.params' % (opt.model))
-        net.save_params(fname)
+        net.save_parameters(fname)
         logger.info('[Epoch %d] Saving checkpoint to %s with Accuracy: %.4f', epoch, fname, top1)
 
 def train(opt, ctx):
@@ -267,7 +267,7 @@ def main():
                 optimizer = 'sgd',
                 optimizer_params = {'learning_rate': opt.lr, 'wd': opt.wd, 'momentum': opt.momentum, 'multi_precision': True},
                 initializer = mx.init.Xavier(magnitude=2))
-        mod.save_params('image-classifier-%s-%d-final.params'%(opt.model, opt.epochs))
+        mod.save_parameters('image-classifier-%s-%d-final.params'%(opt.model, opt.epochs))
     else:
         if opt.mode == 'hybrid':
             net.hybridize()
diff --git a/example/gluon/mnist.py b/example/gluon/mnist.py
index 198d7ca5ab2..6aea3abc504 100644
--- a/example/gluon/mnist.py
+++ b/example/gluon/mnist.py
@@ -117,7 +117,7 @@ def train(epochs, ctx):
         name, val_acc = test(ctx)
         print('[Epoch %d] Validation: %s=%f'%(epoch, name, val_acc))
 
-    net.save_params('mnist.params')
+    net.save_parameters('mnist.params')
 
 
 if __name__ == '__main__':
diff --git a/example/gluon/style_transfer/main.py b/example/gluon/style_transfer/main.py
index 7fcc927f9cb..c67b830fe72 100644
--- a/example/gluon/style_transfer/main.py
+++ b/example/gluon/style_transfer/main.py
@@ -54,7 +54,7 @@ def train(args):
     style_model.initialize(init=mx.initializer.MSRAPrelu(), ctx=ctx)
     if args.resume is not None:
         print('Resuming, initializing using weight from {}.'.format(args.resume))
-        style_model.load_params(args.resume, ctx=ctx)
+        style_model.load_parameters(args.resume, ctx=ctx)
     print('style_model:',style_model)
     # optimizer and loss
     trainer = gluon.Trainer(style_model.collect_params(), 'adam',
@@ -118,14 +118,14 @@ def train(args):
                 save_model_filename = "Epoch_" + str(e) + "iters_" + str(count) + "_" + str(time.ctime()).replace(' ', '_') + "_" + str(
                     args.content_weight) + "_" + str(args.style_weight) + ".params"
                 save_model_path = os.path.join(args.save_model_dir, save_model_filename)
-                style_model.save_params(save_model_path)
+                style_model.save_parameters(save_model_path)
                 print("\nCheckpoint, trained model saved at", save_model_path)
 
     # save model
     save_model_filename = "Final_epoch_" + str(args.epochs) + "_" + str(time.ctime()).replace(' ', '_') + "_" + str(
         args.content_weight) + "_" + str(args.style_weight) + ".params"
     save_model_path = os.path.join(args.save_model_dir, save_model_filename)
-    style_model.save_params(save_model_path)
+    style_model.save_parameters(save_model_path)
     print("\nDone, trained model saved at", save_model_path)
 
 
@@ -140,7 +140,7 @@ def evaluate(args):
     style_image = utils.preprocess_batch(style_image)
     # model
     style_model = net.Net(ngf=args.ngf)
-    style_model.load_params(args.model, ctx=ctx)
+    style_model.load_parameters(args.model, ctx=ctx)
     # forward
     style_model.setTarget(style_image)
     output = style_model(content_image)
diff --git a/example/gluon/super_resolution.py b/example/gluon/super_resolution.py
index 38c3bec8949..0f2f21f3c0a 100644
--- a/example/gluon/super_resolution.py
+++ b/example/gluon/super_resolution.py
@@ -168,13 +168,13 @@ def train(epoch, ctx):
         print('training mse at epoch %d: %s=%f'%(i, name, acc))
         test(ctx)
 
-    net.save_params('superres.params')
+    net.save_parameters('superres.params')
 
 def resolve(ctx):
     from PIL import Image
     if isinstance(ctx, list):
         ctx = [ctx[0]]
-    net.load_params('superres.params', ctx=ctx)
+    net.load_parameters('superres.params', ctx=ctx)
     img = Image.open(opt.resolve_img).convert('YCbCr')
     y, cb, cr = img.split()
     data = mx.nd.expand_dims(mx.nd.expand_dims(mx.nd.array(y), axis=0), axis=0)
diff --git a/example/gluon/tree_lstm/main.py b/example/gluon/tree_lstm/main.py
index d2fe464638a..ad5d59f7a47 100644
--- a/example/gluon/tree_lstm/main.py
+++ b/example/gluon/tree_lstm/main.py
@@ -138,7 +138,7 @@ def test(ctx, data_iter, best, mode='validation', num_iter=-1):
         if test_r >= best:
             best = test_r
             logging.info('New optimum found: {}. Checkpointing.'.format(best))
-            net.save_params('childsum_tree_lstm_{}.params'.format(num_iter))
+            net.save_parameters('childsum_tree_lstm_{}.params'.format(num_iter))
             test(ctx, test_iter, -1, 'test')
         return best
 
diff --git a/example/gluon/word_language_model/train.py b/example/gluon/word_language_model/train.py
index 9e152636bb0..7f0a916b79b 100644
--- a/example/gluon/word_language_model/train.py
+++ b/example/gluon/word_language_model/train.py
@@ -185,7 +185,7 @@ def train():
         if val_L < best_val:
             best_val = val_L
             test_L = eval(test_data)
-            model.save_params(args.save)
+            model.save_parameters(args.save)
             print('test loss %.2f, test ppl %.2f'%(test_L, math.exp(test_L)))
         else:
             args.lr = args.lr*0.25
@@ -193,6 +193,6 @@ def train():
 
 if __name__ == '__main__':
     train()
-    model.load_params(args.save, context)
+    model.load_parameters(args.save, context)
     test_L = eval(test_data)
     print('Best test loss %.2f, test ppl %.2f'%(test_L, math.exp(test_L)))
diff --git a/python/mxnet/gluon/block.py b/python/mxnet/gluon/block.py
index 0f415436116..d8de11b5131 100644
--- a/python/mxnet/gluon/block.py
+++ b/python/mxnet/gluon/block.py
@@ -16,7 +16,7 @@
 # under the License.
 
 # coding: utf-8
-# pylint: disable= arguments-differ
+# pylint: disable= arguments-differ, too-many-lines
 """Base container class for all neural network models."""
 __all__ = ['Block', 'HybridBlock', 'SymbolBlock']
 
@@ -148,7 +148,8 @@ def forward(self, x):
 
 
     Child :py:class:`Block` assigned this way will be registered and :py:meth:`collect_params`
-    will collect their Parameters recursively.
+    will collect their Parameters recursively. You can also manually register
+    child blocks with :py:meth:`register_child`.
 
     Parameters
     ----------
@@ -265,12 +266,12 @@ def collect_params(self, select=None):
         children's Parameters(default), also can returns the select :py:class:`ParameterDict`
         which match some given regular expressions.
 
-        For example, collect the specified parameter in ['conv1_weight', 'conv1_bias', 'fc_weight',
+        For example, collect the specified parameters in ['conv1_weight', 'conv1_bias', 'fc_weight',
         'fc_bias']::
 
             model.collect_params('conv1_weight|conv1_bias|fc_weight|fc_bias')
 
-        or collect all paramters which their name ends with 'weight' or 'bias', this can be done
+        or collect all parameters whose names end with 'weight' or 'bias', this can be done
         using regular expressions::
 
             model.collect_params('.*weight|.*bias')
@@ -304,9 +305,22 @@ def _collect_params_with_prefix(self, prefix=''):
             ret.update(child._collect_params_with_prefix(prefix + name))
         return ret
 
-    def save_params(self, filename):
+    def save_parameters(self, filename):
         """Save parameters to file.
+        This function is to be used to save parameters of a Gluon model, note that
+        the saved parameters are not meant to be loaded in a different language binding for now.
+        Saving parameters using `.save_parameters()` is different than
+        `.collect_params().save()` and `.save_params()`, which are deprecated ways
+        to save the parameters of a model and should be avoided.
+
+        If your model is hybridizable and you want to export a serialized version of the
+        structure of the model as well as its parameters please refer to
+        :py:meth:`HybridBlock.export`.
+        Refer to this tutorial for a complete overview of saving/loading models with
+        MXNet: https://mxnet.incubator.apache.org/tutorials/gluon/save_load_params.html
 
+        Parameters
+        ----------
         filename : str
             Path to file.
         """
@@ -314,14 +328,35 @@ def save_params(self, filename):
         arg_dict = {key : val._reduce() for key, val in params.items()}
         ndarray.save(filename, arg_dict)
 
-    def load_params(self, filename, ctx=None, allow_missing=False,
-                    ignore_extra=False):
+    def save_params(self, filename):
+        """[Deprecated] Please use save_parameters.
+
+        Save parameters to file.
+
+        filename : str
+            Path to file.
+        """
+        warnings.warn("save_params is deprecated. Please use save_parameters.")
+        try:
+            self.collect_params().save(filename, strip_prefix=self.prefix)
+        except ValueError as e:
+            raise ValueError('%s\nsave_params is deprecated. Using ' \
+                              'save_parameters may resolve this error.'%e.message)
+
+    def load_parameters(self, filename, ctx=None, allow_missing=False,
+                        ignore_extra=False):
         """Load parameters from file.
+        This function is to be used to load parameters of a Gluon model that were
+        saved using the `.save_parameters()` function. Any other use is undefined behaviour.
+        Refer to this tutorial for a complete overview of saving/loading models with
+        MXNet: https://mxnet.incubator.apache.org/tutorials/gluon/save_load_params.html
 
+        Parameters
+        ----------
         filename : str
             Path to parameter file.
         ctx : Context or list of Context, default cpu()
-            Context(s) initialize loaded parameters on.
+            Context(s) to initialize loaded parameters on.
         allow_missing : bool, default False
             Whether to silently skip loading parameters not represents in the file.
         ignore_extra : bool, default False
@@ -355,6 +390,25 @@ def load_params(self, filename, ctx=None, allow_missing=False,
             params[name]._load_init(loaded[name], ctx)
 
 
+    def load_params(self, filename, ctx=None, allow_missing=False,
+                    ignore_extra=False):
+        """[Deprecated] Please use load_parameters.
+
+        Load parameters from file.
+
+        filename : str
+            Path to parameter file.
+        ctx : Context or list of Context, default cpu()
+            Context(s) to initialize loaded parameters on.
+        allow_missing : bool, default False
+            Whether to silently skip loading parameters not represents in the file.
+        ignore_extra : bool, default False
+            Whether to silently ignore parameters from the file that are not
+            present in this Block.
+        """
+        warnings.warn("load_params is deprecated. Please use load_parameters.")
+        self.load_parameters(filename, ctx, allow_missing, ignore_extra)
+
     def register_child(self, block, name=None):
         """Registers block as a child of self. :py:class:`Block` s assigned to self as
         attributes will be registered automatically."""
@@ -428,9 +482,31 @@ def forward(self, *args):
 class HybridBlock(Block):
     """`HybridBlock` supports forwarding with both Symbol and NDArray.
 
+    `HybridBlock` is similar to `Block`, with a few differences::
+
+        import mxnet as mx
+        from mxnet.gluon import HybridBlock, nn
+
+        class Model(HybridBlock):
+            def __init__(self, **kwargs):
+                super(Model, self).__init__(**kwargs)
+                # use name_scope to give child Blocks appropriate names.
+                with self.name_scope():
+                    self.dense0 = nn.Dense(20)
+                    self.dense1 = nn.Dense(20)
+
+            def hybrid_forward(self, F, x):
+                x = F.relu(self.dense0(x))
+                return F.relu(self.dense1(x))
+
+        model = Model()
+        model.initialize(ctx=mx.cpu(0))
+        model.hybridize()
+        model(mx.nd.zeros((10, 10), ctx=mx.cpu(0)))
+
     Forward computation in :py:class:`HybridBlock` must be static to work with :py:class:`Symbol` s,
     i.e. you cannot call :py:meth:`NDArray.asnumpy`, :py:attr:`NDArray.shape`,
-    :py:attr:`NDArray.dtype`, etc on tensors.
+    :py:attr:`NDArray.dtype`, `NDArray` indexing (`x[i]`) etc on tensors.
     Also, you cannot use branching or loop logic that bases on non-constant
     expressions like random numbers or intermediate results, since they change
     the graph structure for each iteration.
@@ -440,8 +516,12 @@ class HybridBlock(Block):
     representing the forward computation and cache it. On subsequent forwards,
     the cached graph will be used instead of :py:meth:`hybrid_forward`.
 
-    Refer `Hybrid tutorial <http://mxnet.io/tutorials/gluon/hybrid.html>`_ to see
-    the end-to-end usage.
+    Please see references for detailed tutorial.
+
+    References
+    ----------
+        `Hybrid - Faster training and easy deployment
+        <http://mxnet.io/tutorials/gluon/hybrid.html>`_
     """
     def __init__(self, prefix=None, params=None):
         super(HybridBlock, self).__init__(prefix=prefix, params=params)
@@ -579,8 +659,8 @@ def infer_type(self, *args):
         self._infer_attrs('infer_type', 'dtype', *args)
 
     def export(self, path, epoch=0):
-        """Export HybridBlock to json format that can be loaded by `mxnet.mod.Module`
-        or the C++ interface.
+        """Export HybridBlock to json format that can be loaded by
+        `SymbolBlock.imports`, `mxnet.mod.Module` or the C++ interface.
 
         .. note:: When there are only one input, it will have name `data`. When there
                   Are more than one inputs, they will be named as `data0`, `data1`, etc.
@@ -681,6 +761,50 @@ class SymbolBlock(HybridBlock):
     >>> x = mx.nd.random.normal(shape=(16, 3, 224, 224))
     >>> print(feat_model(x))
     """
+    @staticmethod
+    def imports(symbol_file, input_names, param_file=None, ctx=None):
+        """Import model previously saved by `HybridBlock.export` or
+        `Module.save_checkpoint` as a SymbolBlock for use in Gluon.
+
+        Parameters
+        ----------
+        symbol_file : str
+            Path to symbol file.
+        input_names : list of str
+            List of input variable names
+        param_file : str, optional
+            Path to parameter file.
+        ctx : Context, default None
+            The context to initialize SymbolBlock on.
+
+        Returns
+        -------
+        SymbolBlock
+            SymbolBlock loaded from symbol and parameter files.
+
+        Examples
+        --------
+        >>> net1 = gluon.model_zoo.vision.resnet18_v1(
+        ...     prefix='resnet', pretrained=True)
+        >>> net1.hybridize()
+        >>> x = mx.nd.random.normal(shape=(1, 3, 32, 32))
+        >>> out1 = net1(x)
+        >>> net1.export('net1', epoch=1)
+        >>>
+        >>> net2 = gluon.SymbolBlock.imports(
+        ...     'net1-symbol.json', ['data'], 'net1-0001.params')
+        >>> out2 = net2(x)
+        """
+        sym = symbol.load(symbol_file)
+        if isinstance(input_names, str):
+            input_names = [input_names]
+        inputs = [symbol.var(i) for i in input_names]
+        ret = SymbolBlock(sym, inputs)
+        if param_file is not None:
+            ret.collect_params().load(param_file, ctx=ctx)
+        return ret
+
+
     def __init__(self, outputs, inputs, params=None):
         super(SymbolBlock, self).__init__(prefix=None, params=None)
         self._prefix = ''
diff --git a/python/mxnet/gluon/model_zoo/vision/alexnet.py b/python/mxnet/gluon/model_zoo/vision/alexnet.py
index 55499470460..fdb006258c2 100644
--- a/python/mxnet/gluon/model_zoo/vision/alexnet.py
+++ b/python/mxnet/gluon/model_zoo/vision/alexnet.py
@@ -83,5 +83,5 @@ def alexnet(pretrained=False, ctx=cpu(),
     net = AlexNet(**kwargs)
     if pretrained:
         from ..model_store import get_model_file
-        net.load_params(get_model_file('alexnet', root=root), ctx=ctx)
+        net.load_parameters(get_model_file('alexnet', root=root), ctx=ctx)
     return net
diff --git a/python/mxnet/gluon/model_zoo/vision/densenet.py b/python/mxnet/gluon/model_zoo/vision/densenet.py
index 835336739a6..b03f5ce8d52 100644
--- a/python/mxnet/gluon/model_zoo/vision/densenet.py
+++ b/python/mxnet/gluon/model_zoo/vision/densenet.py
@@ -141,7 +141,7 @@ def get_densenet(num_layers, pretrained=False, ctx=cpu(),
     net = DenseNet(num_init_features, growth_rate, block_config, **kwargs)
     if pretrained:
         from ..model_store import get_model_file
-        net.load_params(get_model_file('densenet%d'%(num_layers), root=root), ctx=ctx)
+        net.load_parameters(get_model_file('densenet%d'%(num_layers), root=root), ctx=ctx)
     return net
 
 def densenet121(**kwargs):
diff --git a/python/mxnet/gluon/model_zoo/vision/inception.py b/python/mxnet/gluon/model_zoo/vision/inception.py
index 6d75050b83f..7c54691f1b5 100644
--- a/python/mxnet/gluon/model_zoo/vision/inception.py
+++ b/python/mxnet/gluon/model_zoo/vision/inception.py
@@ -216,5 +216,5 @@ def inception_v3(pretrained=False, ctx=cpu(),
     net = Inception3(**kwargs)
     if pretrained:
         from ..model_store import get_model_file
-        net.load_params(get_model_file('inceptionv3', root=root), ctx=ctx)
+        net.load_parameters(get_model_file('inceptionv3', root=root), ctx=ctx)
     return net
diff --git a/python/mxnet/gluon/model_zoo/vision/mobilenet.py b/python/mxnet/gluon/model_zoo/vision/mobilenet.py
index 7c3b7d643eb..f75de2197f7 100644
--- a/python/mxnet/gluon/model_zoo/vision/mobilenet.py
+++ b/python/mxnet/gluon/model_zoo/vision/mobilenet.py
@@ -201,7 +201,7 @@ def get_mobilenet(multiplier, pretrained=False, ctx=cpu(),
         version_suffix = '{0:.2f}'.format(multiplier)
         if version_suffix in ('1.00', '0.50'):
             version_suffix = version_suffix[:-1]
-        net.load_params(
+        net.load_parameters(
             get_model_file('mobilenet%s' % version_suffix, root=root), ctx=ctx)
     return net
 
@@ -233,7 +233,7 @@ def get_mobilenet_v2(multiplier, pretrained=False, ctx=cpu(),
         version_suffix = '{0:.2f}'.format(multiplier)
         if version_suffix in ('1.00', '0.50'):
             version_suffix = version_suffix[:-1]
-        net.load_params(
+        net.load_parameters(
             get_model_file('mobilenetv2_%s' % version_suffix, root=root), ctx=ctx)
     return net
 
diff --git a/python/mxnet/gluon/model_zoo/vision/resnet.py b/python/mxnet/gluon/model_zoo/vision/resnet.py
index 5ee67b510a8..da279b89583 100644
--- a/python/mxnet/gluon/model_zoo/vision/resnet.py
+++ b/python/mxnet/gluon/model_zoo/vision/resnet.py
@@ -386,8 +386,8 @@ def get_resnet(version, num_layers, pretrained=False, ctx=cpu(),
     net = resnet_class(block_class, layers, channels, **kwargs)
     if pretrained:
         from ..model_store import get_model_file
-        net.load_params(get_model_file('resnet%d_v%d'%(num_layers, version),
-                                       root=root), ctx=ctx)
+        net.load_parameters(get_model_file('resnet%d_v%d'%(num_layers, version),
+                                           root=root), ctx=ctx)
     return net
 
 def resnet18_v1(**kwargs):
diff --git a/python/mxnet/gluon/model_zoo/vision/squeezenet.py b/python/mxnet/gluon/model_zoo/vision/squeezenet.py
index 09f62a52074..aaff4c36dfa 100644
--- a/python/mxnet/gluon/model_zoo/vision/squeezenet.py
+++ b/python/mxnet/gluon/model_zoo/vision/squeezenet.py
@@ -132,7 +132,7 @@ def get_squeezenet(version, pretrained=False, ctx=cpu(),
     net = SqueezeNet(version, **kwargs)
     if pretrained:
         from ..model_store import get_model_file
-        net.load_params(get_model_file('squeezenet%s'%version, root=root), ctx=ctx)
+        net.load_parameters(get_model_file('squeezenet%s'%version, root=root), ctx=ctx)
     return net
 
 def squeezenet1_0(**kwargs):
diff --git a/python/mxnet/gluon/model_zoo/vision/vgg.py b/python/mxnet/gluon/model_zoo/vision/vgg.py
index dbae5385898..a3b1685b413 100644
--- a/python/mxnet/gluon/model_zoo/vision/vgg.py
+++ b/python/mxnet/gluon/model_zoo/vision/vgg.py
@@ -114,8 +114,8 @@ def get_vgg(num_layers, pretrained=False, ctx=cpu(),
     if pretrained:
         from ..model_store import get_model_file
         batch_norm_suffix = '_bn' if kwargs.get('batch_norm') else ''
-        net.load_params(get_model_file('vgg%d%s'%(num_layers, batch_norm_suffix),
-                                       root=root), ctx=ctx)
+        net.load_parameters(get_model_file('vgg%d%s'%(num_layers, batch_norm_suffix),
+                                           root=root), ctx=ctx)
     return net
 
 def vgg11(**kwargs):
diff --git a/python/mxnet/gluon/parameter.py b/python/mxnet/gluon/parameter.py
index 99885eb7d47..ac2eb40a643 100644
--- a/python/mxnet/gluon/parameter.py
+++ b/python/mxnet/gluon/parameter.py
@@ -342,6 +342,8 @@ def initialize(self, init=None, ctx=None, default_init=initializer.Uniform(),
     def reset_ctx(self, ctx):
         """Re-assign Parameter to other contexts.
 
+        Parameters
+        ----------
         ctx : Context or list of Context, default ``context.current_context()``.
             Assign Parameter to given context. If ctx is a list of Context, a
             copy will be made for each context.
@@ -478,8 +480,8 @@ def __init__(self, **kwargs):
                 super(Block, self).__init__(**kwargs)
                 self.const = self.params.get_constant('const', [[1,2],[3,4]])
 
-    Parameter
-    ---------
+    Parameters
+    ----------
     name : str
         Name of the parameter.
     value : array-like
@@ -619,7 +621,7 @@ def get_constant(self, name, value=None):
         found, :py:func:`get` will create a new :py:class:`Constant` with key-word
         arguments and insert it to self.
 
-        Constants
+        Parameters
         ----------
         name : str
             Name of the desired Constant. It will be prepended with this dictionary's
@@ -694,6 +696,8 @@ def zero_grad(self):
     def reset_ctx(self, ctx):
         """Re-assign all Parameters to other contexts.
 
+        Parameters
+        ----------
         ctx : Context or list of Context, default :py:meth:`context.current_context()`.
             Assign Parameter to given context. If ctx is a list of Context, a
             copy will be made for each context.
@@ -726,6 +730,8 @@ def setattr(self, name, value):
     def save(self, filename, strip_prefix=''):
         """Save parameters to file.
 
+        Parameters
+        ----------
         filename : str
             Path to parameter file.
         strip_prefix : str, default ''
@@ -750,6 +756,8 @@ def load(self, filename, ctx=None, allow_missing=False,
              ignore_extra=False, restore_prefix=''):
         """Load parameters from file.
 
+        Parameters
+        ----------
         filename : str
             Path to parameter file.
         ctx : Context or list of Context
diff --git a/scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala b/scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala
index 7289df19712..87c9bc72be0 100644
--- a/scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala
+++ b/scala-package/core/src/main/scala/org/apache/mxnet/FeedForward.scala
@@ -224,13 +224,24 @@ class FeedForward private(
     var i = 0
     while (data.hasNext && i != numBatch) {
       val batch = data.next()
-      i += 1
-      ExecutorManager.loadData(batch, dataArrays)
-      predExec.forward(isTrain = false)
-      val padded = batch.pad
-      val realSize = batchSize - padded
-      for ((list, nd) <- outputs zip predExec.outputs) {
-        list += nd.slice(0, realSize).copy()
+      try {
+        i += 1
+        ExecutorManager.loadData(batch, dataArrays)
+        predExec.forward(isTrain = false)
+        val padded = batch.pad
+        val realSize = batchSize - padded
+        for ((list, nd) <- outputs zip predExec.outputs) {
+          // The slice is being written to a value so that dispose can be called after the copy.
+          // The one liner nd.slice().copy() leads to leaking the memory of the slice.
+          val ndSliced = nd.slice(0, realSize)
+          try {
+            list += ndSliced.copy()
+          } finally {
+            ndSliced.dispose()
+          }
+        }
+      } finally {
+        batch.dispose()
       }
     }
     // TODO(Yizhi): we can use Symbol.concat to do the same thing. Can it be more efficient?
diff --git a/src/operator/nn/convolution.cu b/src/operator/nn/convolution.cu
index 65a320ded16..9f61212d5c7 100644
--- a/src/operator/nn/convolution.cu
+++ b/src/operator/nn/convolution.cu
@@ -89,8 +89,11 @@ void ConvolutionCompute<gpu>(const nnvm::NodeAttrs& attrs,
   const ConvolutionParam& param = nnvm::get<ConvolutionParam>(attrs.parsed);
   int dtype = inputs[conv::kData].type_flag_;
 
-  // If 1D convolution, use MXNet implementation
-  if (param.kernel.ndim() == 1) {
+#if CUDNN_MAJOR < 5
+  if (param_.layout.value() != kNCW &&
+      param_.layout.value() != kNCHW &&
+      param_.layout.value() != kNCDHW) {
+    // Need CuDNN > 5.0 for layout support. use MXNet implementation
     MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
       ConvolutionOp<gpu, DType> op;
       op.Init(param);
@@ -98,6 +101,8 @@ void ConvolutionCompute<gpu>(const nnvm::NodeAttrs& attrs,
     })
     return;
   }
+#endif
+
 #if MXNET_USE_CUDNN == 0 || CUDNN_MAJOR < 7
   if (param.num_filter == param.num_group &&
       param.layout.value() == mshadow::kNCHW &&
@@ -162,8 +167,11 @@ void ConvolutionGradCompute<gpu>(const nnvm::NodeAttrs& attrs,
   const std::vector<TBlob> &in_grad = outputs;
   int dtype = out_grad.type_flag_;
 
-  // If 1D convolution, use MXNet implementation
-  if (param.kernel.ndim() == 1) {
+#if CUDNN_MAJOR < 5
+  if (param_.layout.value() != kNCW &&
+      param_.layout.value() != kNCHW &&
+      param_.layout.value() != kNCDHW) {
+    // Need CuDNN > 5.0 for layout support. use MXNet implementation
     MSHADOW_REAL_TYPE_SWITCH(dtype, DType, {
       ConvolutionOp<gpu, DType> op;
       op.Init(param);
@@ -171,6 +179,7 @@ void ConvolutionGradCompute<gpu>(const nnvm::NodeAttrs& attrs,
     })
     return;
   }
+#endif
 #if MXNET_USE_CUDNN == 0 || CUDNN_MAJOR < 7
   if (param.num_filter == param.num_group &&
       param.layout.value() == mshadow::kNCHW &&
diff --git a/tests/python/unittest/test_gluon.py b/tests/python/unittest/test_gluon.py
index 0a5bda831d9..2bd5fd4f4e3 100644
--- a/tests/python/unittest/test_gluon.py
+++ b/tests/python/unittest/test_gluon.py
@@ -96,20 +96,20 @@ def forward(self, x):
     net1.collect_params().initialize()
     net2(mx.nd.zeros((3, 5)))
 
-    net1.save_params('net1.params')
+    net1.save_parameters('net1.params')
 
     net3 = Net(prefix='net3_')
-    net3.load_params('net1.params', mx.cpu())
+    net3.load_parameters('net1.params', mx.cpu())
 
     net4 = Net(prefix='net4_')
     net5 = Net(prefix='net5_', in_units=5, params=net4.collect_params())
     net4.collect_params().initialize()
     net5(mx.nd.zeros((3, 5)))
 
-    net4.save_params('net4.params')
+    net4.save_parameters('net4.params')
 
     net6 = Net(prefix='net6_')
-    net6.load_params('net4.params', mx.cpu())
+    net6.load_parameters('net4.params', mx.cpu())
 
 
 @with_seed()
@@ -672,7 +672,7 @@ def test_export():
     model = gluon.model_zoo.vision.resnet18_v1(
         prefix='resnet', ctx=ctx, pretrained=True)
     model.hybridize()
-    data = mx.nd.random.normal(shape=(1, 3, 224, 224))
+    data = mx.nd.random.normal(shape=(1, 3, 32, 32))
     out = model(data)
 
     model.export('gluon')
@@ -690,6 +690,22 @@ def test_export():
 
     assert_almost_equal(out.asnumpy(), out2.asnumpy())
 
+@with_seed()
+def test_import():
+    ctx = mx.context.current_context()
+    net1 = gluon.model_zoo.vision.resnet18_v1(
+        prefix='resnet', ctx=ctx, pretrained=True)
+    net1.hybridize()
+    data = mx.nd.random.normal(shape=(1, 3, 32, 32))
+    out1 = net1(data)
+
+    net1.export('net1', epoch=1)
+
+    net2 = gluon.SymbolBlock.imports(
+        'net1-symbol.json', ['data'], 'net1-0001.params', ctx)
+    out2 = net2(data)
+
+    assert_almost_equal(out1.asnumpy(), out2.asnumpy())
 
 @with_seed()
 def test_hybrid_stale_cache():
@@ -806,7 +822,7 @@ def test_fill_shape_load():
     net1.hybridize()
     net1.initialize(ctx=ctx)
     net1(mx.nd.ones((2,3,5,7), ctx))
-    net1.save_params('net_fill.params')
+    net1.save_parameters('net_fill.params')
 
     net2 = nn.HybridSequential()
     with net2.name_scope():
@@ -815,7 +831,7 @@ def test_fill_shape_load():
                  nn.Dense(10))
     net2.hybridize()
     net2.initialize()
-    net2.load_params('net_fill.params', ctx)
+    net2.load_parameters('net_fill.params', ctx)
     assert net2[0].weight.shape[1] == 3, net2[0].weight.shape[1]
     assert net2[1].gamma.shape[0] == 64, net2[1].gamma.shape[0]
     assert net2[2].weight.shape[1] == 3072, net2[2].weight.shape[1]
@@ -959,14 +975,80 @@ def test_req():
 
 def test_save_load():
     net = mx.gluon.model_zoo.vision.get_resnet(1, 18, pretrained=True)
-    net.save_params('test.params')
+    net.save_parameters('test_save_load.params')
 
     net = mx.gluon.model_zoo.vision.get_resnet(1, 18)
     net.output = mx.gluon.nn.Dense(1000)
 
-    net.load_params('test.params')
+    net.load_parameters('test_save_load.params')
+
+@with_seed()
+def test_symbol_block_save_load():
+    class Net(gluon.HybridBlock):
+        def __init__(self):
+            super(Net, self).__init__()
+            with self.name_scope():
+                backbone = gluon.model_zoo.vision.resnet18_v1()
+                data = mx.sym.var('data')
+                featnames = ['stage1_activation0', 'stage2_activation0', 'stage3_activation0']
+                out_names = ['_'.join([backbone.name, featname, 'output']) for featname in featnames]
+                internals = backbone(data).get_internals()
+                outs = [internals[out_name] for out_name in out_names]
+                self.backbone = gluon.SymbolBlock(outs, data, params=backbone.collect_params())
+                self.body = nn.Conv2D(3, 1)
+
+        def hybrid_forward(self, F, x):
+            x = self.body(x)
+            return self.backbone(x)
+
+    net1 = Net()
+    net1.initialize(mx.init.Normal())
+    net1.hybridize()
+    net1(mx.nd.random.normal(shape=(1, 3, 32, 32)))
+    net1.save_parameters('./test_symbol_block_save_load.params')
+
+    net2 = Net()
+    net2.load_parameters('./test_symbol_block_save_load.params', ctx=mx.cpu())
 
 
+@with_seed()
+def test_hybrid_multi_context():
+    net = mx.gluon.model_zoo.vision.get_resnet(1, 18)
+    net.initialize(ctx=[mx.cpu(0), mx.cpu(1)])
+    net.hybridize()
+    net(mx.nd.zeros((1, 3, 32, 32), ctx=mx.cpu(0))).asnumpy()
+
+@with_seed()
+def test_zero_grad():
+    data = mx.nd.random.uniform(shape=(3,3))
+    net = nn.Embedding(3, 4, sparse_grad=True, prefix='test_zero_grad_')
+    net.initialize()
+    with mx.autograd.record():
+        l = net(data)
+        l.backward()
+    net.collect_params().zero_grad()
+    grad = net.collect_params()['test_zero_grad_weight'].grad()
+    assert_almost_equal(grad.asnumpy(), grad.asnumpy() * 0)
+
+def check_hybrid_static_memory(**kwargs):
+    x = mx.nd.random.uniform(shape=(2, 3, 32, 32))
+    x.attach_grad()
+
+@with_seed()
+def test_legacy_save_params():
+    net = gluon.nn.HybridSequential(prefix='')
+    with net.name_scope():
+        net.add(gluon.nn.Conv2D(10, (3, 3)))
+        net.add(gluon.nn.Dense(50))
+    net.initialize()
+    net(mx.nd.ones((1,1,50,50)))
+    a = net(mx.sym.var('data'))
+    a.save('test.json')
+    net.save_params('test.params')
+    model = gluon.nn.SymbolBlock(outputs=mx.sym.load_json(open('test.json', 'r').read()),
+                                     inputs=mx.sym.var('data'))
+    model.load_params('test.params', ctx=mx.cpu())
+
 
 if __name__ == '__main__':
     import nose


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services