You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/11/25 21:18:55 UTC

[GitHub] [incubator-mxnet] bgawrych opened a new pull request #19587: [FEATURE] Restore Quantization API to MXNet

bgawrych opened a new pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587


   ## Description ##
   This PR restores quantization API and some examples to master branch of MXNet. Change prepared together with @sfraczek and @grygielski 
   Quantization API now utilizes HybridBlock as symbol executor and new features like `optimize_for`
   Moreover:
   - enabled numpy semantics support
   - improved custom calibration flow (user can now use layer names which need calibration in their layer output collectors)
   - renamed quantize_net_v2 to quantize_net
   - using DataLoader instead of DataIter in model calibration
   - retained quantize_model and quantize_model_mkldnn which works with symbol
   
   
   ## Checklist ##
   ### Essentials ###
   - [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   - [x] Code is well-documented
   
   @anko-intel @sfraczek @grygielski @TaoLv @pengzhao-intel @ciyongch 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] leezu commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

leezu commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742626531


   @bgawrych you can refer to https://github.com/apache/incubator-mxnet/issues/19623 for the CI issue. It looks like an effect of https://github.com/apache/incubator-mxnet/issues/18501 together with memory usage increase in recent versions of gcc. If you have time to help fix it, that would be great.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742513350


   Jenkins CI successfully triggered : [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742461667


   > Do we need to start a new RFC or design doc as demonstrated originally in [1] and [2]?
   > 
   > [1] https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
   > [2] #9552
   
   We haven't changed design of quantization - all code changes in backend are related to changed node naming conventions. I don't know if in this case we must propose new RFC as previous one was accepted and all what we have done here is bringing it back
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] TaoLv commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

TaoLv commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539806803



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)
+<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+
+Intel® oneDNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with oneDNN backend](#TODO(agrygielski)).

Review comment:
       Remove the TODO?

##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)
+<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+
+Intel® oneDNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with oneDNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_onednn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel oneDNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_onednn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® oneDNN.

Review comment:
       > `flaot32 or int8`
   
   I see `FP32` in the first paragraph. Need unify the term through the tutorial.

##########
File path: example/quantization/imagenet_gen_qsym_onednn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:
+    # alexnet
+    # densenet121, densenet161
+    # densenet169, densenet201
+    # inceptionv3
+    # mobilenet0.25, mobilenet0.5, mobilenet0.75, mobilenet1.0,
+    # mobilenetv2_0.25, mobilenetv2_0.5, mobilenetv2_0.75, mobilenetv2_1.0
+    # resnet101_v1, resnet152_v1, resnet18_v1, resnet34_v1, resnet50_v1
+    # resnet101_v2, resnet152_v2, resnet18_v2, resnet34_v2, resnet50_v2
+    # squeezenet1.0, squeezenet1.1
+    # vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn
+    exclude_symbol_regex = {
+        'mobilenet[^v]': ['mobilenet_hybridsequential0_flatten0_flatten0', 'mobilenet_hybridsequential0_globalavgpool2d0_fwd'],
+        'mobilenetv2': ['mobilenetv2_hybridsequential1_flatten0_flatten0'],
+        # resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd is excluded for the sake of accuracy
+        'resnet.*v2': ['resnetv2_hybridsequential0_flatten0_flatten0', 'resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd'],
+        'squeezenet1': ['squeezenet_hybridsequential1_flatten0_flatten0'],
+    }
+    excluded_sym_names = regex_find_excluded_symbols(exclude_symbol_regex, model_name)
+    if excluded_sym_names is None:
+        excluded_sym_names = []
+    if exclude_first_conv:
+        first_conv_regex = {
+            'alexnet': ['alexnet_hybridsequential0_conv2d0_fwd'],
+            'densenet': ['densenet_hybridsequential0_conv2d0_fwd'],
+            'inceptionv3': ['inception3_hybridsequential0_hybridsequential0_conv2d0_fwd'],
+            'mobilenet[^v]': ['mobilenet_hybridsequential0_conv2d0_fwd'],
+            'mobilenetv2': ['mobilenetv2_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v1': ['resnetv1_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v2': ['resnetv2_hybridsequential0_conv2d0_fwd'],
+            'squeezenet1': ['squeezenet_hybridsequential0_conv2d0_fwd'],
+            'vgg': ['vgg_hybridsequential0_conv2d0_fwd'],
+        }
+        excluded_first_conv_sym_names = regex_find_excluded_symbols(first_conv_regex, model_name)
+        if excluded_first_conv_sym_names is None:
+            raise ValueError('Currently, model %s is not supported in this script' % model_name)
+        excluded_sym_names += excluded_first_conv_sym_names
+    return excluded_sym_names
+
+
+

Review comment:
       It looks like too many blank lines.

##########
File path: example/quantization/imagenet_gen_qsym_onednn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:

Review comment:
       Better to use doc string?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539282606



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       I've added verified models with tested accuracy - we will add more of them in the future




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536244971



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.

Review comment:
       Yes! :) There is some discussion about renaming MKL-DNN to oneDNN in this RFC: https://github.com/apache/incubator-mxnet/issues/19610
   For now I've changed only occurrences in example/quantization/* and used full name in this particular line




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] TaoLv commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

TaoLv commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742211234


   Do we need to start a new RFC or design doc as demonstrated originally in [1] and [2]?
   
   [1] https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN
   [2] https://github.com/apache/incubator-mxnet/pull/9552


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536247183



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       We choose this ResNetV1 as an example - previously in this file there was more models but content was almost the same (changed model name only).
   Moreover only some models work with script, because of this issue: https://github.com/apache/incubator-mxnet/issues/19580 and GluonCV for master branch is not compatible yet
   We will add references when more models will be available




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539946288



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)
+<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+
+Intel® oneDNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with oneDNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_onednn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel oneDNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_onednn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® oneDNN.

Review comment:
       Done

##########
File path: example/quantization/imagenet_gen_qsym_onednn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536242869



##########
File path: tests/python/quantization/test_quantization.py
##########
@@ -0,0 +1,1210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Some of the tests using CUDNN require a special GPU instruction called dp4a.
+Ref: http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
+"""
+import os
+import mxnet as mx
+import numpy as np
+from mxnet.gluon.model_zoo import vision
+from mxnet.test_utils import assert_almost_equal, assert_exception, rand_ndarray, rand_shape_nd, same, DummyIter
+from common import xfail_when_nonstandard_decimal_separator
+from mxnet.io import NDArrayIter
+import unittest
+import operator
+
+
+def initialize_block_params(block, initializer):
+    for name, param in block.collect_params('.*gamma|.*running_var|.*moving_var').items():
+        param.initialize(mx.init.Constant(1))
+    for name, param in block.collect_params('.*beta|.*bias|.*moving_mean|.*running_mean').items():
+        param.initialize(mx.init.Constant(0))
+    for name, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def collect_block_args_aux(block, sym):
+  arg_params, aux_params = dict(), dict()
+  for k, v in block.collect_params().items():
+    if k in sym.list_arguments():
+      arg_params[k]= v._reduce()
+    elif k in sym.list_auxiliary_states():
+      aux_params[k]= v._reduce()
+  return arg_params, aux_params
+
+def is_test_for_gpu():
+    return mx.current_context().device_type == 'gpu'
+
+
+def is_test_for_mkldnn():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == '1')
+
+
+def is_test_for_native_cpu():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == None)
+
+
+def test_quantize_float32_to_int8():
+    shape = rand_shape_nd(4)
+    data = rand_ndarray(shape, 'default', dtype='float32')
+    min_range = mx.nd.min(data)
+    max_range = mx.nd.max(data)
+    qdata, min_val, max_val = mx.nd.contrib.quantize(data, min_range, max_range, out_type='int8')
+    data_np = data.asnumpy()
+    min_range = min_range.asscalar()
+    max_range = max_range.asscalar()
+    real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+    quantized_range = 127.0
+    scale = quantized_range / real_range
+    assert qdata.dtype == np.int8
+    assert min_val.dtype == np.float32
+    assert max_val.dtype == np.float32
+    assert same(min_val.asscalar(), -real_range)
+    assert same(max_val.asscalar(), real_range)
+    qdata_np = (np.sign(data_np) * np.minimum(np.abs(data_np) * scale + 0.5, quantized_range)).astype(np.int8)
+    assert_almost_equal(qdata.asnumpy(), qdata_np, atol = 1)
+
+
+def test_dequantize_int8_to_float32():
+
+    def get_test_data(real_range, qdata_np):
+        qdata = mx.nd.array(qdata_np, dtype=np.int8)
+        min_range = mx.nd.array([-real_range], dtype=np.float32)
+        max_range = mx.nd.array([real_range], dtype=np.float32)
+        return qdata, min_range, max_range
+
+    def baseline_dequantization(qdata, real_range, qdata_np):
+        quantized_range = 127.0
+        scale = real_range / quantized_range
+        data_np = qdata_np * scale
+        return data_np
+
+    def test_nd_array_dequantization(qdata, min_range, max_range, expected_result):
+        data = mx.nd.contrib.dequantize(qdata, min_range, max_range, out_type='float32')
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    def test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result):
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
+                                            sym_max_range, out_type='float32')
+        out = dequant._bind(ctx=mx.current_context(),
+                           args={'data':qdata, 'min_range':min_range, 'max_range':max_range})
+        data = out.forward()[0]
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    real_range = 128
+    shape = rand_shape_nd(4)
+    qdata_np = np.random.uniform(low=-127, high=127, size=shape).astype(dtype=np.int8)
+    qdata, min_range, max_range = get_test_data(real_range, qdata_np)
+    expected_result = baseline_dequantization(qdata, real_range, qdata_np)
+    # test nd array implementation.
+    test_nd_array_dequantization(qdata, min_range, max_range, expected_result)
+    # test symbolic api implementaion.
+    test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result)
+
+
+def test_requantize_int32_to_int8():
+    def quantized_int32_to_float(qdata, min_range, max_range):
+        assert qdata.dtype == 'int32'
+        quantized_range = np.iinfo('int32').max
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        scale = float(real_range) / float(quantized_range)
+        return qdata.astype('float32') * scale
+
+    def float_to_quantized_int8(data, min_range, max_range):
+        assert data.dtype == 'float32'
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        quantized_range = np.iinfo('int8').max
+        scale = float(quantized_range) / float(real_range)
+        return (np.sign(data) * np.minimum(np.abs(data) * scale + 0.5, quantized_range)).astype('int8')
+
+    def requantize(qdata, min_data, max_data, real_range):
+        data = quantized_int32_to_float(qdata, min_data, max_data)
+        output = float_to_quantized_int8(data, -real_range, real_range)
+        return output, -real_range, real_range
+
+    def requantize_baseline(qdata, min_data, max_data, min_calib_range=None, max_calib_range=None):
+        if min_calib_range is not None and max_calib_range is not None:
+            real_range = np.maximum(np.abs(min_calib_range), np.abs(max_calib_range))
+            return requantize(qdata, min_data, max_data, real_range)
+        else:
+            min_range = quantized_int32_to_float(np.min(qdata), min_data, max_data)
+            max_range = quantized_int32_to_float(np.max(qdata), min_data, max_data)
+            return requantize(qdata, min_data, max_data, np.maximum(np.abs(min_range), np.abs(max_range)))
+
+    def check_requantize(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        if min_calib_range is None or max_calib_range is None:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range)
+        else:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range,
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    def check_requantize_with_symbol(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        if min_calib_range is None or max_calib_range is None:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range)
+            out = requant._bind(ctx=mx.current_context(),
+                               args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+        else:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range,
+                                                min_calib_range=min_calib_range,
+                                                max_calib_range=max_calib_range)
+            out = requant._bind(ctx=mx.current_context(), args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    # test with symbol API.
+    check_requantize_with_symbol((3, 4, 10, 10))
+    check_requantize_with_symbol((32, 3, 23, 23))
+    check_requantize_with_symbol((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize_with_symbol((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+    # Test with nd array API
+    check_requantize((3, 4, 10, 10))
+    check_requantize((32, 3, 23, 23))
+    check_requantize((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+
+
+def test_quantized_conv():
+    def check_quantized_conv(data_shape, kernel, num_filter, pad, stride, dilate, no_bias, qdtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_conv for native cpu since it is not supported yet')
+            return
+        elif is_test_for_mkldnn():
+            # (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
+            print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')
+            return
+        elif qdtype == 'uint8' and is_test_for_gpu():
+            print('skipped testing quantized_conv for gpu uint8 since it is not supported yet')
+            return
+        elif is_test_for_gpu() and len(data_shape) != 4:
+            print('skipped testing quantized_conv for gpu 5d layout since it is not supported yet')
+            return
+
+        # run fp32 conv
+        data = mx.sym.Variable(name='data', shape=data_shape, dtype='float32')
+        conv = mx.sym.Convolution(data=data, kernel=kernel, num_filter=num_filter, pad=pad, stride=stride,
+                                  dilate=dilate, no_bias=no_bias, cudnn_off=False, name='conv')
+        arg_shapes, _, _ = conv.infer_shape(data=data_shape)
+        arg_names = conv.list_arguments()
+        conv_exe_fp32 = conv._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qdtype == 'uint8':
+            data_low = 0.0
+            data_high = 127.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+        conv_exe_fp32.arg_dict[arg_names[0]][:] = mx.nd.random.uniform(low=data_low, high=data_high,
+                                                                       shape=data_shape).astype('int32')
+        conv_exe_fp32.arg_dict[arg_names[1]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                       shape=arg_shapes[1]).astype('int32')
+        if not no_bias:
+            conv_exe_fp32.arg_dict[arg_names[2]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                           shape=arg_shapes[2]).astype('int32')
+        output = conv_exe_fp32.forward()[0]
+
+        # run quantized conv
+        qdata = mx.sym.Variable(name='qdata', shape=data_shape, dtype=qdtype)
+        qweight = mx.sym.Variable(name='qweight', dtype='int8')
+        min_data = mx.sym.Variable(name='min_data')
+        max_data = mx.sym.Variable(name='max_data')
+        min_weight = mx.sym.Variable(name='min_weight')
+        max_weight = mx.sym.Variable(name='max_weight')
+        quantized_conv = mx.sym.contrib.quantized_conv(data=qdata, weight=qweight, min_data=min_data,
+                                                       max_data=max_data, min_weight=min_weight,
+                                                       max_weight=max_weight, kernel=kernel,
+                                                       num_filter=num_filter, pad=pad, stride=stride,
+                                                       dilate=dilate, no_bias=no_bias)
+        qarg_names = quantized_conv.list_arguments()
+        type_dict = None
+        if not no_bias:
+            type_dict = {qarg_names[2]: 'int8'}
+        conv_exe_int8 = quantized_conv._simple_bind(ctx=mx.current_context(), type_dict=type_dict, grad_req='null')
+        conv_exe_int8.arg_dict[qarg_names[0]][:] = conv_exe_fp32.arg_dict[arg_names[0]].astype(qdtype)
+        conv_exe_int8.arg_dict[qarg_names[1]][:] = conv_exe_fp32.arg_dict[arg_names[1]].astype('int8')
+        quantized_range = 127.0
+        if no_bias:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = quantized_range
+        else:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = conv_exe_fp32.arg_dict[arg_names[2]].astype('int8')
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[6]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[7]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[8]][:] = quantized_range
+        qoutput, min_range, max_range = conv_exe_int8.forward()
+
+        if no_bias:
+            assert_almost_equal(output.asnumpy(), qoutput.asnumpy(), atol = 1)
+        else:
+            # with adding bias, accuracy loss should not be greater than one
+            diff = mx.nd.abs(output - qoutput.astype(output.dtype))
+            cond = mx.nd.lesser(2, diff).sum().asscalar()
+            assert cond == 0
+
+    for qdtype in ['int8', 'uint8']:
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), True, qdtype)
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), True, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), True, qdtype)
+
+
+def test_quantized_elemwise_add():
+    def check_quantized_elemwise_add(data_shape, qtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_elemwise_add for native cpu since it is not supported yet')
+            return
+        elif qtype != 'uint8' and qtype != 'int8':
+            print('skipped testing quantized_elemwise_add for not supported data type')
+            return
+        elif is_test_for_gpu():
+            print('skipped testing quantized_elemwise_add for gpu since it is not supported yet')
+            return
+
+        dataA = mx.sym.Variable(name='dataA', shape=data_shape, dtype='float32')
+        dataB = mx.sym.Variable(name='dataB', shape=data_shape, dtype='float32')
+        elemwise_add_fp32 = mx.sym.elemwise_add(dataA, dataB)
+        arg_names = elemwise_add_fp32.list_arguments()
+        elemwise_add_fp32_exe = elemwise_add_fp32._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qtype == 'uint8':
+            data_low = 0.0
+            data_high = 255.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+
+        dataA_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        dataB_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        elemwise_add_fp32_exe.arg_dict[arg_names[0]][:] = dataA_val
+
+        elemwise_add_fp32_exe.arg_dict[arg_names[1]][:] = dataB_val
+
+        output = elemwise_add_fp32_exe.forward()[0]
+        print(output)
+        qdataA = mx.sym.Variable(name='qdataA', shape=data_shape, dtype=qtype)
+        qdataB = mx.sym.Variable(name='qdataB', shape=data_shape, dtype=qtype)
+        min_dataA = mx.sym.Variable(name='min_dataA', dtype='float32')
+        max_dataA = mx.sym.Variable(name='max_dataA', dtype='float32')
+        min_dataB = mx.sym.Variable(name='min_dataB', dtype='float32')
+        max_dataB = mx.sym.Variable(name='max_dataB', dtype='float32')
+        quantized_elemwise_add = mx.sym.contrib.quantized_elemwise_add(qdataA, qdataB, min_dataA, max_dataA, min_dataB, max_dataB)
+        elemwise_add_int8_exe = quantized_elemwise_add._simple_bind(ctx=mx.current_context(), grad_req='null')
+        qarg_names = quantized_elemwise_add.list_arguments()
+        elemwise_add_int8_exe.arg_dict[qarg_names[0]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[0]].astype(qtype)
+        elemwise_add_int8_exe.arg_dict[qarg_names[1]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[1]].astype(qtype)
+        quantized_range = 127.0
+        elemwise_add_int8_exe.arg_dict[qarg_names[2]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[3]][:] = data_high
+        elemwise_add_int8_exe.arg_dict[qarg_names[4]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[5]][:] = data_high
+        qoutput, min_range, max_range = elemwise_add_int8_exe.forward()
+        print(qoutput)
+        int8_rslt = qoutput.astype(output.dtype)*max_range/0x7fffffff
+        print(int8_rslt)

Review comment:
       Good catch, thank you and done :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

ciyongch commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536164436



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       Previously, we listed all the models that supports quantization for better reference, is there any limitation to support other models or any other reasons?

##########
File path: src/operator/subgraph/mkldnn/mkldnn_conv.cc
##########
@@ -137,17 +147,22 @@ void SgMKLDNNConvOperator::Forward(const OpContext &ctx,
   auto in_beta = mkldnn_param.with_bn ? (idx++) : 0;
   auto in_mean = mkldnn_param.with_bn ? (idx++) : 0;
   auto in_var = mkldnn_param.with_bn ? (idx++) : 0;
-  auto in_sum = mkldnn_param.with_sum ? (idx++) : 0;
+  auto in_sum = mkldnn_param.with_sum ? (mkldnn_param.dedup_sum? in_data : idx++) : -1;
   float data_min =
       mkldnn_param.quantized ? inputs[idx++].data().dptr<float>()[0] : 0.0;
   float data_max =
       mkldnn_param.quantized ? inputs[idx++].data().dptr<float>()[0] : 0.0;
-  float sum_min = (mkldnn_param.with_sum && mkldnn_param.quantized)
-                      ? inputs[idx++].data().dptr<float>()[0]
-                      : 0.0;
-  float sum_max = (mkldnn_param.with_sum && mkldnn_param.quantized)
-                      ? inputs[idx++].data().dptr<float>()[0]
-                      : 0.0;
+  float sum_min = 0.0f;

Review comment:
       Seems it's a new feature compared to v1.x, is there any test cases to cover it?

##########
File path: example/quantization/imagenet_gen_qsym_mkldnn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:
+    # alexnet
+    # densenet121, densenet161
+    # densenet169, densenet201
+    # inceptionv3
+    # mobilenet0.25, mobilenet0.5, mobilenet0.75, mobilenet1.0,
+    # mobilenetv2_0.25, mobilenetv2_0.5, mobilenetv2_0.75, mobilenetv2_1.0
+    # resnet101_v1, resnet152_v1, resnet18_v1, resnet34_v1, resnet50_v1
+    # resnet101_v2, resnet152_v2, resnet18_v2, resnet34_v2, resnet50_v2
+    # squeezenet1.0, squeezenet1.1
+    # vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn
+    exclude_symbol_regex = {
+        'mobilenet[^v]': ['mobilenet_hybridsequential0_flatten0_flatten0', 'mobilenet_hybridsequential0_globalavgpool2d0_fwd'],
+        'mobilenetv2': ['mobilenetv2_hybridsequential1_flatten0_flatten0'],
+        # resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd is excluded for the sake of accuracy
+        'resnet.*v2': ['resnetv2_hybridsequential0_flatten0_flatten0', 'resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd'],
+        'squeezenet1': ['squeezenet_hybridsequential1_flatten0_flatten0'],
+    }
+    excluded_sym_names = regex_find_excluded_symbols(exclude_symbol_regex, model_name)
+    if excluded_sym_names is None:
+        excluded_sym_names = []
+    if exclude_first_conv:
+        first_conv_regex = {
+            'alexnet': ['alexnet_hybridsequential0_conv2d0_fwd'],
+            'densenet': ['densenet_hybridsequential0_conv2d0_fwd'],
+            'inceptionv3': ['inception3_hybridsequential0_hybridsequential0_conv2d0_fwd'],
+            'mobilenet[^v]': ['mobilenet_hybridsequential0_conv2d0_fwd'],
+            'mobilenetv2': ['mobilenetv2_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v1': ['resnetv1_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v2': ['resnetv2_hybridsequential0_conv2d0_fwd'],
+            'squeezenet1': ['squeezenet_hybridsequential0_conv2d0_fwd'],
+            'vgg': ['vgg_hybridsequential0_conv2d0_fwd'],
+        }
+        excluded_first_conv_sym_names = regex_find_excluded_symbols(first_conv_regex, model_name)
+        if excluded_first_conv_sym_names is None:
+            raise ValueError('Currently, model %s is not supported in this script' % model_name)
+        excluded_sym_names += excluded_first_conv_sym_names
+    return excluded_sym_names
+
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support')
+    parser.add_argument('--model', type=str, default='resnet50_v1',
+                        help='model to be quantized. If no-pretrained is set then'
+                             'model must be provided to `model` directory in the same path'
+                             'as this python script')
+    parser.add_argument('--epoch', type=int, default=0,
+                        help='number of epochs, default is 0')
+    parser.add_argument('--no-pretrained', action='store_true', default=False,
+                        help='If enabled, will not download pretrained model from MXNet or Gluon-CV modelzoo.')
+    parser.add_argument('--batch-size', type=int, default=32)
+    parser.add_argument('--calib-dataset', type=str, default='data/val_256_q90.rec',
+                        help='path of the calibration dataset')
+    parser.add_argument('--image-shape', type=str, default='3,224,224',
+                        help='number of channels, height and width of input image separated by comma')
+    parser.add_argument('--data-nthreads', type=int, default=0,
+                        help='number of threads for data loading')
+    parser.add_argument('--num-calib-batches', type=int, default=10,
+                        help='number of batches for calibration')
+    parser.add_argument('--exclude-first-conv', action='store_true', default=False,
+                        help='excluding quantizing the first conv layer since the'
+                             ' input data may have negative value which doesn\'t support at moment' )
+    parser.add_argument('--shuffle-dataset', action='store_true',
+                        help='shuffle the calibration dataset')
+    parser.add_argument('--calib-mode', type=str, default='entropy',
+                        help='calibration mode used for generating calibration table for the quantized symbol; supports'
+                             ' 1. none: no calibration will be used. The thresholds for quantization will be calculated'
+                             ' on the fly. This will result in inference speed slowdown and loss of accuracy'
+                             ' in general.'
+                             ' 2. naive: simply take min and max values of layer outputs as thresholds for'
+                             ' quantization. In general, the inference accuracy worsens with more examples used in'
+                             ' calibration. It is recommended to use `entropy` mode as it produces more accurate'
+                             ' inference results.'
+                             ' 3. entropy: calculate KL divergence of the fp32 output and quantized output for optimal'
+                             ' thresholds. This mode is expected to produce the best inference accuracy of all three'
+                             ' kinds of calibration modes if the calibration dataset is representative enough of the'
+                             ' inference dataset.')
+    parser.add_argument('--quantized-dtype', type=str, default='auto',
+                        choices=['auto', 'int8', 'uint8'],
+                        help='quantization destination data type for input data')
+    parser.add_argument('--quiet', action='store_true', default=False,
+                        help='suppress most of log')
+    args = parser.parse_args()
+    ctx = mx.cpu(0)
+    logger = None
+
+    if not args.quiet:
+        logging.basicConfig()
+        logger = logging.getLogger('logger')
+        logger.setLevel(logging.INFO)
+
+    if logger:
+        logger.info(args)
+        logger.info('shuffle_dataset=%s' % args.shuffle_dataset)
+        logger.info('calibration mode set to %s' % args.calib_mode)
+
+    calib_mode = args.calib_mode
+
+    # download calibration dataset
+    if calib_mode != 'none':
+        idx_file_name = os.path.splitext(args.calib_dataset)[0] + '.idx'
+        if not os.path.isfile(idx_file_name):
+            download_calib_dataset('http://data.mxnet.io/data/val_256_q90.rec', args.calib_dataset)
+            creator = IndexCreator(args.calib_dataset, idx_file_name)
+            creator.create_index()
+            creator.close()
+
+    # get image shape
+    image_shape = args.image_shape
+    data_shape = [(1,) + tuple(int(i) for i in image_shape.split(','))]
+
+    # check if directory for output model exists
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    dir_path = os.path.join(dir_path, 'model')
+    if not os.path.exists(dir_path):
+        os.mkdir(dir_path) # without try catch block as we expect to finish
+                           # script if it fail
+
+    # download model
+    if not args.no_pretrained:
+        if logger:
+            logger.info('Get pre-trained model from Gluon-CV modelzoo.')
+            logger.info('If you want to use custom model, please set --no-pretrained.')
+        net, prefix = get_from_gluon(model_name=args.model, classes=1000, logger=logger)
+        rgb_mean = '0.485,0.456,0.406'
+        rgb_std = '0.229,0.224,0.225'

Review comment:
       Does this `mean` and `std` the new value to gluon models? As I remember `mean` and `std` was ~100+ and ~58 previously.

##########
File path: tests/python/quantization/test_quantization.py
##########
@@ -0,0 +1,1210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Some of the tests using CUDNN require a special GPU instruction called dp4a.
+Ref: http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
+"""
+import os
+import mxnet as mx
+import numpy as np
+from mxnet.gluon.model_zoo import vision
+from mxnet.test_utils import assert_almost_equal, assert_exception, rand_ndarray, rand_shape_nd, same, DummyIter
+from common import xfail_when_nonstandard_decimal_separator
+from mxnet.io import NDArrayIter
+import unittest
+import operator
+
+
+def initialize_block_params(block, initializer):
+    for name, param in block.collect_params('.*gamma|.*running_var|.*moving_var').items():
+        param.initialize(mx.init.Constant(1))
+    for name, param in block.collect_params('.*beta|.*bias|.*moving_mean|.*running_mean').items():
+        param.initialize(mx.init.Constant(0))
+    for name, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def collect_block_args_aux(block, sym):
+  arg_params, aux_params = dict(), dict()
+  for k, v in block.collect_params().items():
+    if k in sym.list_arguments():
+      arg_params[k]= v._reduce()
+    elif k in sym.list_auxiliary_states():
+      aux_params[k]= v._reduce()
+  return arg_params, aux_params
+
+def is_test_for_gpu():
+    return mx.current_context().device_type == 'gpu'
+
+
+def is_test_for_mkldnn():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == '1')
+
+
+def is_test_for_native_cpu():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == None)
+
+
+def test_quantize_float32_to_int8():
+    shape = rand_shape_nd(4)
+    data = rand_ndarray(shape, 'default', dtype='float32')
+    min_range = mx.nd.min(data)
+    max_range = mx.nd.max(data)
+    qdata, min_val, max_val = mx.nd.contrib.quantize(data, min_range, max_range, out_type='int8')
+    data_np = data.asnumpy()
+    min_range = min_range.asscalar()
+    max_range = max_range.asscalar()
+    real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+    quantized_range = 127.0
+    scale = quantized_range / real_range
+    assert qdata.dtype == np.int8
+    assert min_val.dtype == np.float32
+    assert max_val.dtype == np.float32
+    assert same(min_val.asscalar(), -real_range)
+    assert same(max_val.asscalar(), real_range)
+    qdata_np = (np.sign(data_np) * np.minimum(np.abs(data_np) * scale + 0.5, quantized_range)).astype(np.int8)
+    assert_almost_equal(qdata.asnumpy(), qdata_np, atol = 1)
+
+
+def test_dequantize_int8_to_float32():
+
+    def get_test_data(real_range, qdata_np):
+        qdata = mx.nd.array(qdata_np, dtype=np.int8)
+        min_range = mx.nd.array([-real_range], dtype=np.float32)
+        max_range = mx.nd.array([real_range], dtype=np.float32)
+        return qdata, min_range, max_range
+
+    def baseline_dequantization(qdata, real_range, qdata_np):
+        quantized_range = 127.0
+        scale = real_range / quantized_range
+        data_np = qdata_np * scale
+        return data_np
+
+    def test_nd_array_dequantization(qdata, min_range, max_range, expected_result):
+        data = mx.nd.contrib.dequantize(qdata, min_range, max_range, out_type='float32')
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    def test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result):
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
+                                            sym_max_range, out_type='float32')
+        out = dequant._bind(ctx=mx.current_context(),
+                           args={'data':qdata, 'min_range':min_range, 'max_range':max_range})
+        data = out.forward()[0]
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    real_range = 128
+    shape = rand_shape_nd(4)
+    qdata_np = np.random.uniform(low=-127, high=127, size=shape).astype(dtype=np.int8)
+    qdata, min_range, max_range = get_test_data(real_range, qdata_np)
+    expected_result = baseline_dequantization(qdata, real_range, qdata_np)
+    # test nd array implementation.
+    test_nd_array_dequantization(qdata, min_range, max_range, expected_result)
+    # test symbolic api implementaion.
+    test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result)
+
+
+def test_requantize_int32_to_int8():
+    def quantized_int32_to_float(qdata, min_range, max_range):
+        assert qdata.dtype == 'int32'
+        quantized_range = np.iinfo('int32').max
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        scale = float(real_range) / float(quantized_range)
+        return qdata.astype('float32') * scale
+
+    def float_to_quantized_int8(data, min_range, max_range):
+        assert data.dtype == 'float32'
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        quantized_range = np.iinfo('int8').max
+        scale = float(quantized_range) / float(real_range)
+        return (np.sign(data) * np.minimum(np.abs(data) * scale + 0.5, quantized_range)).astype('int8')
+
+    def requantize(qdata, min_data, max_data, real_range):
+        data = quantized_int32_to_float(qdata, min_data, max_data)
+        output = float_to_quantized_int8(data, -real_range, real_range)
+        return output, -real_range, real_range
+
+    def requantize_baseline(qdata, min_data, max_data, min_calib_range=None, max_calib_range=None):
+        if min_calib_range is not None and max_calib_range is not None:
+            real_range = np.maximum(np.abs(min_calib_range), np.abs(max_calib_range))
+            return requantize(qdata, min_data, max_data, real_range)
+        else:
+            min_range = quantized_int32_to_float(np.min(qdata), min_data, max_data)
+            max_range = quantized_int32_to_float(np.max(qdata), min_data, max_data)
+            return requantize(qdata, min_data, max_data, np.maximum(np.abs(min_range), np.abs(max_range)))
+
+    def check_requantize(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        if min_calib_range is None or max_calib_range is None:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range)
+        else:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range,
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    def check_requantize_with_symbol(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        if min_calib_range is None or max_calib_range is None:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range)
+            out = requant._bind(ctx=mx.current_context(),
+                               args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+        else:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range,
+                                                min_calib_range=min_calib_range,
+                                                max_calib_range=max_calib_range)
+            out = requant._bind(ctx=mx.current_context(), args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    # test with symbol API.
+    check_requantize_with_symbol((3, 4, 10, 10))
+    check_requantize_with_symbol((32, 3, 23, 23))
+    check_requantize_with_symbol((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize_with_symbol((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+    # Test with nd array API
+    check_requantize((3, 4, 10, 10))
+    check_requantize((32, 3, 23, 23))
+    check_requantize((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+
+
+def test_quantized_conv():
+    def check_quantized_conv(data_shape, kernel, num_filter, pad, stride, dilate, no_bias, qdtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_conv for native cpu since it is not supported yet')
+            return
+        elif is_test_for_mkldnn():
+            # (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
+            print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')
+            return
+        elif qdtype == 'uint8' and is_test_for_gpu():
+            print('skipped testing quantized_conv for gpu uint8 since it is not supported yet')
+            return
+        elif is_test_for_gpu() and len(data_shape) != 4:
+            print('skipped testing quantized_conv for gpu 5d layout since it is not supported yet')
+            return
+
+        # run fp32 conv
+        data = mx.sym.Variable(name='data', shape=data_shape, dtype='float32')
+        conv = mx.sym.Convolution(data=data, kernel=kernel, num_filter=num_filter, pad=pad, stride=stride,
+                                  dilate=dilate, no_bias=no_bias, cudnn_off=False, name='conv')
+        arg_shapes, _, _ = conv.infer_shape(data=data_shape)
+        arg_names = conv.list_arguments()
+        conv_exe_fp32 = conv._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qdtype == 'uint8':
+            data_low = 0.0
+            data_high = 127.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+        conv_exe_fp32.arg_dict[arg_names[0]][:] = mx.nd.random.uniform(low=data_low, high=data_high,
+                                                                       shape=data_shape).astype('int32')
+        conv_exe_fp32.arg_dict[arg_names[1]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                       shape=arg_shapes[1]).astype('int32')
+        if not no_bias:
+            conv_exe_fp32.arg_dict[arg_names[2]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                           shape=arg_shapes[2]).astype('int32')
+        output = conv_exe_fp32.forward()[0]
+
+        # run quantized conv
+        qdata = mx.sym.Variable(name='qdata', shape=data_shape, dtype=qdtype)
+        qweight = mx.sym.Variable(name='qweight', dtype='int8')
+        min_data = mx.sym.Variable(name='min_data')
+        max_data = mx.sym.Variable(name='max_data')
+        min_weight = mx.sym.Variable(name='min_weight')
+        max_weight = mx.sym.Variable(name='max_weight')
+        quantized_conv = mx.sym.contrib.quantized_conv(data=qdata, weight=qweight, min_data=min_data,
+                                                       max_data=max_data, min_weight=min_weight,
+                                                       max_weight=max_weight, kernel=kernel,
+                                                       num_filter=num_filter, pad=pad, stride=stride,
+                                                       dilate=dilate, no_bias=no_bias)
+        qarg_names = quantized_conv.list_arguments()
+        type_dict = None
+        if not no_bias:
+            type_dict = {qarg_names[2]: 'int8'}
+        conv_exe_int8 = quantized_conv._simple_bind(ctx=mx.current_context(), type_dict=type_dict, grad_req='null')
+        conv_exe_int8.arg_dict[qarg_names[0]][:] = conv_exe_fp32.arg_dict[arg_names[0]].astype(qdtype)
+        conv_exe_int8.arg_dict[qarg_names[1]][:] = conv_exe_fp32.arg_dict[arg_names[1]].astype('int8')
+        quantized_range = 127.0
+        if no_bias:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = quantized_range
+        else:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = conv_exe_fp32.arg_dict[arg_names[2]].astype('int8')
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[6]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[7]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[8]][:] = quantized_range
+        qoutput, min_range, max_range = conv_exe_int8.forward()
+
+        if no_bias:
+            assert_almost_equal(output.asnumpy(), qoutput.asnumpy(), atol = 1)
+        else:
+            # with adding bias, accuracy loss should not be greater than one
+            diff = mx.nd.abs(output - qoutput.astype(output.dtype))
+            cond = mx.nd.lesser(2, diff).sum().asscalar()
+            assert cond == 0
+
+    for qdtype in ['int8', 'uint8']:
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), True, qdtype)
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), True, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), True, qdtype)
+
+
+def test_quantized_elemwise_add():
+    def check_quantized_elemwise_add(data_shape, qtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_elemwise_add for native cpu since it is not supported yet')
+            return
+        elif qtype != 'uint8' and qtype != 'int8':
+            print('skipped testing quantized_elemwise_add for not supported data type')
+            return
+        elif is_test_for_gpu():
+            print('skipped testing quantized_elemwise_add for gpu since it is not supported yet')
+            return
+
+        dataA = mx.sym.Variable(name='dataA', shape=data_shape, dtype='float32')
+        dataB = mx.sym.Variable(name='dataB', shape=data_shape, dtype='float32')
+        elemwise_add_fp32 = mx.sym.elemwise_add(dataA, dataB)
+        arg_names = elemwise_add_fp32.list_arguments()
+        elemwise_add_fp32_exe = elemwise_add_fp32._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qtype == 'uint8':
+            data_low = 0.0
+            data_high = 255.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+
+        dataA_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        dataB_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        elemwise_add_fp32_exe.arg_dict[arg_names[0]][:] = dataA_val
+
+        elemwise_add_fp32_exe.arg_dict[arg_names[1]][:] = dataB_val
+
+        output = elemwise_add_fp32_exe.forward()[0]
+        print(output)
+        qdataA = mx.sym.Variable(name='qdataA', shape=data_shape, dtype=qtype)
+        qdataB = mx.sym.Variable(name='qdataB', shape=data_shape, dtype=qtype)
+        min_dataA = mx.sym.Variable(name='min_dataA', dtype='float32')
+        max_dataA = mx.sym.Variable(name='max_dataA', dtype='float32')
+        min_dataB = mx.sym.Variable(name='min_dataB', dtype='float32')
+        max_dataB = mx.sym.Variable(name='max_dataB', dtype='float32')
+        quantized_elemwise_add = mx.sym.contrib.quantized_elemwise_add(qdataA, qdataB, min_dataA, max_dataA, min_dataB, max_dataB)
+        elemwise_add_int8_exe = quantized_elemwise_add._simple_bind(ctx=mx.current_context(), grad_req='null')
+        qarg_names = quantized_elemwise_add.list_arguments()
+        elemwise_add_int8_exe.arg_dict[qarg_names[0]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[0]].astype(qtype)
+        elemwise_add_int8_exe.arg_dict[qarg_names[1]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[1]].astype(qtype)
+        quantized_range = 127.0
+        elemwise_add_int8_exe.arg_dict[qarg_names[2]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[3]][:] = data_high
+        elemwise_add_int8_exe.arg_dict[qarg_names[4]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[5]][:] = data_high
+        qoutput, min_range, max_range = elemwise_add_int8_exe.forward()
+        print(qoutput)
+        int8_rslt = qoutput.astype(output.dtype)*max_range/0x7fffffff
+        print(int8_rslt)

Review comment:
       Remove these `print` in the test cases?

##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.

Review comment:
       As `MKL-DNN` is already renamed to `oneDNN`, it would be better to use oneDNN in the documents, what do you think?

##########
File path: tests/python/quantization/test_quantization.py
##########
@@ -0,0 +1,1210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Some of the tests using CUDNN require a special GPU instruction called dp4a.
+Ref: http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
+"""
+import os
+import mxnet as mx
+import numpy as np
+from mxnet.gluon.model_zoo import vision
+from mxnet.test_utils import assert_almost_equal, assert_exception, rand_ndarray, rand_shape_nd, same, DummyIter
+from common import xfail_when_nonstandard_decimal_separator
+from mxnet.io import NDArrayIter
+import unittest
+import operator
+
+
+def initialize_block_params(block, initializer):
+    for name, param in block.collect_params('.*gamma|.*running_var|.*moving_var').items():
+        param.initialize(mx.init.Constant(1))
+    for name, param in block.collect_params('.*beta|.*bias|.*moving_mean|.*running_mean').items():
+        param.initialize(mx.init.Constant(0))
+    for name, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def collect_block_args_aux(block, sym):
+  arg_params, aux_params = dict(), dict()
+  for k, v in block.collect_params().items():
+    if k in sym.list_arguments():
+      arg_params[k]= v._reduce()
+    elif k in sym.list_auxiliary_states():
+      aux_params[k]= v._reduce()
+  return arg_params, aux_params
+
+def is_test_for_gpu():
+    return mx.current_context().device_type == 'gpu'
+
+
+def is_test_for_mkldnn():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == '1')
+
+
+def is_test_for_native_cpu():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == None)
+
+
+def test_quantize_float32_to_int8():
+    shape = rand_shape_nd(4)
+    data = rand_ndarray(shape, 'default', dtype='float32')
+    min_range = mx.nd.min(data)
+    max_range = mx.nd.max(data)
+    qdata, min_val, max_val = mx.nd.contrib.quantize(data, min_range, max_range, out_type='int8')
+    data_np = data.asnumpy()
+    min_range = min_range.asscalar()
+    max_range = max_range.asscalar()
+    real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+    quantized_range = 127.0
+    scale = quantized_range / real_range
+    assert qdata.dtype == np.int8
+    assert min_val.dtype == np.float32
+    assert max_val.dtype == np.float32
+    assert same(min_val.asscalar(), -real_range)
+    assert same(max_val.asscalar(), real_range)
+    qdata_np = (np.sign(data_np) * np.minimum(np.abs(data_np) * scale + 0.5, quantized_range)).astype(np.int8)
+    assert_almost_equal(qdata.asnumpy(), qdata_np, atol = 1)
+
+
+def test_dequantize_int8_to_float32():
+
+    def get_test_data(real_range, qdata_np):
+        qdata = mx.nd.array(qdata_np, dtype=np.int8)
+        min_range = mx.nd.array([-real_range], dtype=np.float32)
+        max_range = mx.nd.array([real_range], dtype=np.float32)
+        return qdata, min_range, max_range
+
+    def baseline_dequantization(qdata, real_range, qdata_np):
+        quantized_range = 127.0
+        scale = real_range / quantized_range
+        data_np = qdata_np * scale
+        return data_np
+
+    def test_nd_array_dequantization(qdata, min_range, max_range, expected_result):
+        data = mx.nd.contrib.dequantize(qdata, min_range, max_range, out_type='float32')
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    def test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result):
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
+                                            sym_max_range, out_type='float32')
+        out = dequant._bind(ctx=mx.current_context(),
+                           args={'data':qdata, 'min_range':min_range, 'max_range':max_range})
+        data = out.forward()[0]
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    real_range = 128
+    shape = rand_shape_nd(4)
+    qdata_np = np.random.uniform(low=-127, high=127, size=shape).astype(dtype=np.int8)
+    qdata, min_range, max_range = get_test_data(real_range, qdata_np)
+    expected_result = baseline_dequantization(qdata, real_range, qdata_np)
+    # test nd array implementation.
+    test_nd_array_dequantization(qdata, min_range, max_range, expected_result)
+    # test symbolic api implementaion.
+    test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result)
+
+
+def test_requantize_int32_to_int8():
+    def quantized_int32_to_float(qdata, min_range, max_range):
+        assert qdata.dtype == 'int32'
+        quantized_range = np.iinfo('int32').max
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        scale = float(real_range) / float(quantized_range)
+        return qdata.astype('float32') * scale
+
+    def float_to_quantized_int8(data, min_range, max_range):
+        assert data.dtype == 'float32'
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        quantized_range = np.iinfo('int8').max
+        scale = float(quantized_range) / float(real_range)
+        return (np.sign(data) * np.minimum(np.abs(data) * scale + 0.5, quantized_range)).astype('int8')
+
+    def requantize(qdata, min_data, max_data, real_range):
+        data = quantized_int32_to_float(qdata, min_data, max_data)
+        output = float_to_quantized_int8(data, -real_range, real_range)
+        return output, -real_range, real_range
+
+    def requantize_baseline(qdata, min_data, max_data, min_calib_range=None, max_calib_range=None):
+        if min_calib_range is not None and max_calib_range is not None:
+            real_range = np.maximum(np.abs(min_calib_range), np.abs(max_calib_range))
+            return requantize(qdata, min_data, max_data, real_range)
+        else:
+            min_range = quantized_int32_to_float(np.min(qdata), min_data, max_data)
+            max_range = quantized_int32_to_float(np.max(qdata), min_data, max_data)
+            return requantize(qdata, min_data, max_data, np.maximum(np.abs(min_range), np.abs(max_range)))
+
+    def check_requantize(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        if min_calib_range is None or max_calib_range is None:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range)
+        else:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range,
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    def check_requantize_with_symbol(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        if min_calib_range is None or max_calib_range is None:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range)
+            out = requant._bind(ctx=mx.current_context(),
+                               args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+        else:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range,
+                                                min_calib_range=min_calib_range,
+                                                max_calib_range=max_calib_range)
+            out = requant._bind(ctx=mx.current_context(), args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    # test with symbol API.
+    check_requantize_with_symbol((3, 4, 10, 10))
+    check_requantize_with_symbol((32, 3, 23, 23))
+    check_requantize_with_symbol((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize_with_symbol((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+    # Test with nd array API
+    check_requantize((3, 4, 10, 10))
+    check_requantize((32, 3, 23, 23))
+    check_requantize((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+
+
+def test_quantized_conv():
+    def check_quantized_conv(data_shape, kernel, num_filter, pad, stride, dilate, no_bias, qdtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_conv for native cpu since it is not supported yet')
+            return
+        elif is_test_for_mkldnn():
+            # (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
+            print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')
+            return
+        elif qdtype == 'uint8' and is_test_for_gpu():
+            print('skipped testing quantized_conv for gpu uint8 since it is not supported yet')
+            return
+        elif is_test_for_gpu() and len(data_shape) != 4:
+            print('skipped testing quantized_conv for gpu 5d layout since it is not supported yet')
+            return
+
+        # run fp32 conv
+        data = mx.sym.Variable(name='data', shape=data_shape, dtype='float32')
+        conv = mx.sym.Convolution(data=data, kernel=kernel, num_filter=num_filter, pad=pad, stride=stride,
+                                  dilate=dilate, no_bias=no_bias, cudnn_off=False, name='conv')
+        arg_shapes, _, _ = conv.infer_shape(data=data_shape)
+        arg_names = conv.list_arguments()
+        conv_exe_fp32 = conv._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qdtype == 'uint8':
+            data_low = 0.0
+            data_high = 127.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+        conv_exe_fp32.arg_dict[arg_names[0]][:] = mx.nd.random.uniform(low=data_low, high=data_high,
+                                                                       shape=data_shape).astype('int32')
+        conv_exe_fp32.arg_dict[arg_names[1]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                       shape=arg_shapes[1]).astype('int32')
+        if not no_bias:
+            conv_exe_fp32.arg_dict[arg_names[2]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                           shape=arg_shapes[2]).astype('int32')
+        output = conv_exe_fp32.forward()[0]
+
+        # run quantized conv
+        qdata = mx.sym.Variable(name='qdata', shape=data_shape, dtype=qdtype)
+        qweight = mx.sym.Variable(name='qweight', dtype='int8')
+        min_data = mx.sym.Variable(name='min_data')
+        max_data = mx.sym.Variable(name='max_data')
+        min_weight = mx.sym.Variable(name='min_weight')
+        max_weight = mx.sym.Variable(name='max_weight')
+        quantized_conv = mx.sym.contrib.quantized_conv(data=qdata, weight=qweight, min_data=min_data,
+                                                       max_data=max_data, min_weight=min_weight,
+                                                       max_weight=max_weight, kernel=kernel,
+                                                       num_filter=num_filter, pad=pad, stride=stride,
+                                                       dilate=dilate, no_bias=no_bias)
+        qarg_names = quantized_conv.list_arguments()
+        type_dict = None
+        if not no_bias:
+            type_dict = {qarg_names[2]: 'int8'}
+        conv_exe_int8 = quantized_conv._simple_bind(ctx=mx.current_context(), type_dict=type_dict, grad_req='null')
+        conv_exe_int8.arg_dict[qarg_names[0]][:] = conv_exe_fp32.arg_dict[arg_names[0]].astype(qdtype)
+        conv_exe_int8.arg_dict[qarg_names[1]][:] = conv_exe_fp32.arg_dict[arg_names[1]].astype('int8')
+        quantized_range = 127.0
+        if no_bias:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = quantized_range
+        else:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = conv_exe_fp32.arg_dict[arg_names[2]].astype('int8')
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[6]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[7]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[8]][:] = quantized_range
+        qoutput, min_range, max_range = conv_exe_int8.forward()
+
+        if no_bias:
+            assert_almost_equal(output.asnumpy(), qoutput.asnumpy(), atol = 1)
+        else:
+            # with adding bias, accuracy loss should not be greater than one
+            diff = mx.nd.abs(output - qoutput.astype(output.dtype))
+            cond = mx.nd.lesser(2, diff).sum().asscalar()
+            assert cond == 0
+
+    for qdtype in ['int8', 'uint8']:
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), True, qdtype)
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), True, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), True, qdtype)
+
+
+def test_quantized_elemwise_add():
+    def check_quantized_elemwise_add(data_shape, qtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_elemwise_add for native cpu since it is not supported yet')
+            return
+        elif qtype != 'uint8' and qtype != 'int8':
+            print('skipped testing quantized_elemwise_add for not supported data type')
+            return
+        elif is_test_for_gpu():
+            print('skipped testing quantized_elemwise_add for gpu since it is not supported yet')
+            return
+
+        dataA = mx.sym.Variable(name='dataA', shape=data_shape, dtype='float32')
+        dataB = mx.sym.Variable(name='dataB', shape=data_shape, dtype='float32')
+        elemwise_add_fp32 = mx.sym.elemwise_add(dataA, dataB)
+        arg_names = elemwise_add_fp32.list_arguments()
+        elemwise_add_fp32_exe = elemwise_add_fp32._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qtype == 'uint8':
+            data_low = 0.0
+            data_high = 255.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+
+        dataA_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        dataB_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        elemwise_add_fp32_exe.arg_dict[arg_names[0]][:] = dataA_val
+
+        elemwise_add_fp32_exe.arg_dict[arg_names[1]][:] = dataB_val
+
+        output = elemwise_add_fp32_exe.forward()[0]
+        print(output)
+        qdataA = mx.sym.Variable(name='qdataA', shape=data_shape, dtype=qtype)
+        qdataB = mx.sym.Variable(name='qdataB', shape=data_shape, dtype=qtype)
+        min_dataA = mx.sym.Variable(name='min_dataA', dtype='float32')
+        max_dataA = mx.sym.Variable(name='max_dataA', dtype='float32')
+        min_dataB = mx.sym.Variable(name='min_dataB', dtype='float32')
+        max_dataB = mx.sym.Variable(name='max_dataB', dtype='float32')
+        quantized_elemwise_add = mx.sym.contrib.quantized_elemwise_add(qdataA, qdataB, min_dataA, max_dataA, min_dataB, max_dataB)
+        elemwise_add_int8_exe = quantized_elemwise_add._simple_bind(ctx=mx.current_context(), grad_req='null')
+        qarg_names = quantized_elemwise_add.list_arguments()
+        elemwise_add_int8_exe.arg_dict[qarg_names[0]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[0]].astype(qtype)
+        elemwise_add_int8_exe.arg_dict[qarg_names[1]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[1]].astype(qtype)
+        quantized_range = 127.0
+        elemwise_add_int8_exe.arg_dict[qarg_names[2]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[3]][:] = data_high
+        elemwise_add_int8_exe.arg_dict[qarg_names[4]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[5]][:] = data_high
+        qoutput, min_range, max_range = elemwise_add_int8_exe.forward()
+        print(qoutput)
+        int8_rslt = qoutput.astype(output.dtype)*max_range/0x7fffffff
+        print(int8_rslt)
+        diff = mx.nd.abs(output - int8_rslt)
+        cond = mx.nd.lesser(2, diff).sum().asscalar()
+        assert cond == 0
+
+    for qtype in ['int8', 'uint8']:
+        check_quantized_elemwise_add((4, 6), qtype)
+        # check_quantized_elemwise_add((13, 74, 52), qtype)

Review comment:
       Any reason for comment out these cases?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-739063429


   @mxnet-bot run ci [windows-gpu, edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

ciyongch commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539765385



##########
File path: example/quantization/imagenet_inference.py
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import time
+
+import mxnet as mx
+import numpy as np
+from mxnet import gluon
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+
+
+def download_dataset(dataset_url, dataset_dir, logger=None):
+    if logger is not None:
+        logger.info('Downloading dataset for inference from %s to %s' % (dataset_url, dataset_dir))
+    mx.test_utils.download(dataset_url, dataset_dir)
+
+
+def score(symblock, data, ctx, max_num_examples, skip_num_batches, logger=None):
+    metrics = [gluon.metric.create('acc')]#,
+            #    gluon.metric.create('top_k_accuracy', top_k=5)]
+
+    # make sure that fp32 inference works on the same images as calibrated quantized model
+    logger.info('Skipping the first %d batches' % skip_num_batches)
+
+    tic = time.time()
+    num = 0
+    for i, input_data in enumerate(data):
+        if i < skip_num_batches:
+            continue
+        x = input_data[0].as_in_context(ctx)
+        label = input_data[1].as_in_context(ctx)
+        outputs = symblock.forward(x)
+        for m in metrics:
+            m.update(label, outputs)
+        num += batch_size
+        if max_num_examples is not None and num >= max_num_examples:
+            break
+
+    speed = num / (time.time() - tic)
+
+    if logger is not None:
+        logger.info('Finished inference with %d images' % num)
+        logger.info('Finished with %f images per second', speed)
+        for m in metrics:
+            logger.info(m.get())
+
+def initialize_block_params(block, initializer):
+    for _, param in block.collect_params('.*gamma|.*moving_var|.*running_var').items():
+        param.initialize(mx.init.Constant(1))
+    for _, param in block.collect_params('.*beta|.*moving_mean|.*running_mean|.*bias').items():
+        param.initialize(mx.init.Constant(0))
+    for _, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def benchmark_score(symblock, ctx, batch_size, warmup_batches, num_batches, data_layer_type):
+    if data_layer_type == "int8":
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.int8)
+    elif data_layer_type == 'uint8':
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.uint8)
+    else:  # float32
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.float32)
+
+    # get data
+    if data_layer_type == "float32":
+        data = [mx.random.uniform(-1.0, 1.0, shape=shape, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+    else:
+        data = [mx.nd.full(shape=shape, val=127, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+
+    # run
+    for i in range(warmup_batches+num_batches):
+        if i == warmup_batches:
+            tic = time.time()
+        outputs = symblock.forward(*data)
+        for output in outputs:
+            output.wait_to_read()
+
+    # return num images per second
+    return num_batches * batch_size / (time.time() - tic)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Score a model on a dataset')
+    parser.add_argument('--benchmark', type=bool, default=False, help='dummy data benchmark')
+    parser.add_argument('--symbol-file', type=str, required=True, help='symbol file path')
+    parser.add_argument('--param-file', type=str, required=False, help='param file path')
+    parser.add_argument('--batch-size', type=int, default=32)
+    parser.add_argument('--dataset', type=str, required=False, help='dataset path')
+    parser.add_argument('--rgb-mean', type=str, default='0,0,0')
+    parser.add_argument('--rgb-std', type=str, default='1,1,1')
+    parser.add_argument('--image-shape', type=str, default='3,224,224')
+    parser.add_argument('--data-nthreads', type=int, default=60, help='number of threads for data decoding')
+    parser.add_argument('--num-skipped-batches', type=int, default=0, help='skip the number of batches for inference')
+    parser.add_argument('--num-inference-batches', type=int, required=True, help='number of images used for inference')
+    parser.add_argument('--num-warmup-batches', type=int, default=5, help='number of warmup batches used for benchmark')
+    parser.add_argument('--shuffle-dataset', action='store_true', default=True,
+                        help='shuffle the score dataset')
+    parser.add_argument('--data-layer-type', type=str, default='float32',
+                        choices=['float32', 'int8', 'uint8'],
+                        help='data type for data layer (only with --benchmark)')
+
+    args = parser.parse_args()
+
+    ctx = mx.cpu(0)

Review comment:
       Sure, sounds good :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742464736


   We don't need a new RFC since this is the same approach (quantization flow) migration from 1.x to 2.x.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536242416



##########
File path: example/quantization/imagenet_gen_qsym_mkldnn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:
+    # alexnet
+    # densenet121, densenet161
+    # densenet169, densenet201
+    # inceptionv3
+    # mobilenet0.25, mobilenet0.5, mobilenet0.75, mobilenet1.0,
+    # mobilenetv2_0.25, mobilenetv2_0.5, mobilenetv2_0.75, mobilenetv2_1.0
+    # resnet101_v1, resnet152_v1, resnet18_v1, resnet34_v1, resnet50_v1
+    # resnet101_v2, resnet152_v2, resnet18_v2, resnet34_v2, resnet50_v2
+    # squeezenet1.0, squeezenet1.1
+    # vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn
+    exclude_symbol_regex = {
+        'mobilenet[^v]': ['mobilenet_hybridsequential0_flatten0_flatten0', 'mobilenet_hybridsequential0_globalavgpool2d0_fwd'],
+        'mobilenetv2': ['mobilenetv2_hybridsequential1_flatten0_flatten0'],
+        # resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd is excluded for the sake of accuracy
+        'resnet.*v2': ['resnetv2_hybridsequential0_flatten0_flatten0', 'resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd'],
+        'squeezenet1': ['squeezenet_hybridsequential1_flatten0_flatten0'],
+    }
+    excluded_sym_names = regex_find_excluded_symbols(exclude_symbol_regex, model_name)
+    if excluded_sym_names is None:
+        excluded_sym_names = []
+    if exclude_first_conv:
+        first_conv_regex = {
+            'alexnet': ['alexnet_hybridsequential0_conv2d0_fwd'],
+            'densenet': ['densenet_hybridsequential0_conv2d0_fwd'],
+            'inceptionv3': ['inception3_hybridsequential0_hybridsequential0_conv2d0_fwd'],
+            'mobilenet[^v]': ['mobilenet_hybridsequential0_conv2d0_fwd'],
+            'mobilenetv2': ['mobilenetv2_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v1': ['resnetv1_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v2': ['resnetv2_hybridsequential0_conv2d0_fwd'],
+            'squeezenet1': ['squeezenet_hybridsequential0_conv2d0_fwd'],
+            'vgg': ['vgg_hybridsequential0_conv2d0_fwd'],
+        }
+        excluded_first_conv_sym_names = regex_find_excluded_symbols(first_conv_regex, model_name)
+        if excluded_first_conv_sym_names is None:
+            raise ValueError('Currently, model %s is not supported in this script' % model_name)
+        excluded_sym_names += excluded_first_conv_sym_names
+    return excluded_sym_names
+
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support')
+    parser.add_argument('--model', type=str, default='resnet50_v1',
+                        help='model to be quantized. If no-pretrained is set then'
+                             'model must be provided to `model` directory in the same path'
+                             'as this python script')
+    parser.add_argument('--epoch', type=int, default=0,
+                        help='number of epochs, default is 0')
+    parser.add_argument('--no-pretrained', action='store_true', default=False,
+                        help='If enabled, will not download pretrained model from MXNet or Gluon-CV modelzoo.')
+    parser.add_argument('--batch-size', type=int, default=32)
+    parser.add_argument('--calib-dataset', type=str, default='data/val_256_q90.rec',
+                        help='path of the calibration dataset')
+    parser.add_argument('--image-shape', type=str, default='3,224,224',
+                        help='number of channels, height and width of input image separated by comma')
+    parser.add_argument('--data-nthreads', type=int, default=0,
+                        help='number of threads for data loading')
+    parser.add_argument('--num-calib-batches', type=int, default=10,
+                        help='number of batches for calibration')
+    parser.add_argument('--exclude-first-conv', action='store_true', default=False,
+                        help='excluding quantizing the first conv layer since the'
+                             ' input data may have negative value which doesn\'t support at moment' )
+    parser.add_argument('--shuffle-dataset', action='store_true',
+                        help='shuffle the calibration dataset')
+    parser.add_argument('--calib-mode', type=str, default='entropy',
+                        help='calibration mode used for generating calibration table for the quantized symbol; supports'
+                             ' 1. none: no calibration will be used. The thresholds for quantization will be calculated'
+                             ' on the fly. This will result in inference speed slowdown and loss of accuracy'
+                             ' in general.'
+                             ' 2. naive: simply take min and max values of layer outputs as thresholds for'
+                             ' quantization. In general, the inference accuracy worsens with more examples used in'
+                             ' calibration. It is recommended to use `entropy` mode as it produces more accurate'
+                             ' inference results.'
+                             ' 3. entropy: calculate KL divergence of the fp32 output and quantized output for optimal'
+                             ' thresholds. This mode is expected to produce the best inference accuracy of all three'
+                             ' kinds of calibration modes if the calibration dataset is representative enough of the'
+                             ' inference dataset.')
+    parser.add_argument('--quantized-dtype', type=str, default='auto',
+                        choices=['auto', 'int8', 'uint8'],
+                        help='quantization destination data type for input data')
+    parser.add_argument('--quiet', action='store_true', default=False,
+                        help='suppress most of log')
+    args = parser.parse_args()
+    ctx = mx.cpu(0)
+    logger = None
+
+    if not args.quiet:
+        logging.basicConfig()
+        logger = logging.getLogger('logger')
+        logger.setLevel(logging.INFO)
+
+    if logger:
+        logger.info(args)
+        logger.info('shuffle_dataset=%s' % args.shuffle_dataset)
+        logger.info('calibration mode set to %s' % args.calib_mode)
+
+    calib_mode = args.calib_mode
+
+    # download calibration dataset
+    if calib_mode != 'none':
+        idx_file_name = os.path.splitext(args.calib_dataset)[0] + '.idx'
+        if not os.path.isfile(idx_file_name):
+            download_calib_dataset('http://data.mxnet.io/data/val_256_q90.rec', args.calib_dataset)
+            creator = IndexCreator(args.calib_dataset, idx_file_name)
+            creator.create_index()
+            creator.close()
+
+    # get image shape
+    image_shape = args.image_shape
+    data_shape = [(1,) + tuple(int(i) for i in image_shape.split(','))]
+
+    # check if directory for output model exists
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    dir_path = os.path.join(dir_path, 'model')
+    if not os.path.exists(dir_path):
+        os.mkdir(dir_path) # without try catch block as we expect to finish
+                           # script if it fail
+
+    # download model
+    if not args.no_pretrained:
+        if logger:
+            logger.info('Get pre-trained model from Gluon-CV modelzoo.')
+            logger.info('If you want to use custom model, please set --no-pretrained.')
+        net, prefix = get_from_gluon(model_name=args.model, classes=1000, logger=logger)
+        rgb_mean = '0.485,0.456,0.406'
+        rgb_std = '0.229,0.224,0.225'

Review comment:
       It's because DataIter, which was used with Module, probably doesn't scale input to network to range [0,1]. Now we're using gluon.DataLoader and we're using transforms.ToTensor() as one of the transformation. It's scales input values to range [0,1] - so mean and std values are recalculated old values to range [0,1] instead of [0, 255]




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

ciyongch commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536528388



##########
File path: example/quantization/imagenet_inference.py
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import time
+
+import mxnet as mx
+import numpy as np
+from mxnet import gluon
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+
+
+def download_dataset(dataset_url, dataset_dir, logger=None):
+    if logger is not None:
+        logger.info('Downloading dataset for inference from %s to %s' % (dataset_url, dataset_dir))
+    mx.test_utils.download(dataset_url, dataset_dir)
+
+
+def score(symblock, data, ctx, max_num_examples, skip_num_batches, logger=None):
+    metrics = [gluon.metric.create('acc')]#,
+            #    gluon.metric.create('top_k_accuracy', top_k=5)]
+
+    # make sure that fp32 inference works on the same images as calibrated quantized model
+    logger.info('Skipping the first %d batches' % skip_num_batches)
+
+    tic = time.time()
+    num = 0
+    for i, input_data in enumerate(data):
+        if i < skip_num_batches:
+            continue
+        x = input_data[0].as_in_context(ctx)
+        label = input_data[1].as_in_context(ctx)
+        outputs = symblock.forward(x)
+        for m in metrics:
+            m.update(label, outputs)
+        num += batch_size
+        if max_num_examples is not None and num >= max_num_examples:
+            break
+
+    speed = num / (time.time() - tic)
+
+    if logger is not None:
+        logger.info('Finished inference with %d images' % num)
+        logger.info('Finished with %f images per second', speed)
+        for m in metrics:
+            logger.info(m.get())
+
+def initialize_block_params(block, initializer):
+    for _, param in block.collect_params('.*gamma|.*moving_var|.*running_var').items():
+        param.initialize(mx.init.Constant(1))
+    for _, param in block.collect_params('.*beta|.*moving_mean|.*running_mean|.*bias').items():
+        param.initialize(mx.init.Constant(0))
+    for _, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def benchmark_score(symblock, ctx, batch_size, warmup_batches, num_batches, data_layer_type):
+    if data_layer_type == "int8":
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.int8)
+    elif data_layer_type == 'uint8':
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.uint8)
+    else:  # float32
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.float32)
+
+    # get data
+    if data_layer_type == "float32":
+        data = [mx.random.uniform(-1.0, 1.0, shape=shape, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+    else:
+        data = [mx.nd.full(shape=shape, val=127, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+
+    # run
+    for i in range(warmup_batches+num_batches):
+        if i == warmup_batches:
+            tic = time.time()
+        outputs = symblock.forward(*data)
+        for output in outputs:
+            output.wait_to_read()
+
+    # return num images per second
+    return num_batches * batch_size / (time.time() - tic)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Score a model on a dataset')
+    parser.add_argument('--benchmark', type=bool, default=False, help='dummy data benchmark')
+    parser.add_argument('--symbol-file', type=str, required=True, help='symbol file path')
+    parser.add_argument('--param-file', type=str, required=False, help='param file path')
+    parser.add_argument('--batch-size', type=int, default=32)
+    parser.add_argument('--dataset', type=str, required=False, help='dataset path')
+    parser.add_argument('--rgb-mean', type=str, default='0,0,0')
+    parser.add_argument('--rgb-std', type=str, default='1,1,1')
+    parser.add_argument('--image-shape', type=str, default='3,224,224')
+    parser.add_argument('--data-nthreads', type=int, default=60, help='number of threads for data decoding')
+    parser.add_argument('--num-skipped-batches', type=int, default=0, help='skip the number of batches for inference')
+    parser.add_argument('--num-inference-batches', type=int, required=True, help='number of images used for inference')
+    parser.add_argument('--num-warmup-batches', type=int, default=5, help='number of warmup batches used for benchmark')
+    parser.add_argument('--shuffle-dataset', action='store_true', default=True,
+                        help='shuffle the score dataset')
+    parser.add_argument('--data-layer-type', type=str, default='float32',
+                        choices=['float32', 'int8', 'uint8'],
+                        help='data type for data layer (only with --benchmark)')
+
+    args = parser.parse_args()
+
+    ctx = mx.cpu(0)

Review comment:
       Can we restore the option for choosing different `ctx` here and leave an error msg for those not supported ones (like gpu, etc..)?
   
   BTW, previously this script also supports low precision like `bf16` and `fp16` data types, although they're not related to INT8/quantization, just wondering if there's any limitation to support amp  in current master code base?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-734875898


   @mxnet-bot run ci [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539281949



##########
File path: example/quantization/imagenet_inference.py
##########
@@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import time
+
+import mxnet as mx
+import numpy as np
+from mxnet import gluon
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+
+
+def download_dataset(dataset_url, dataset_dir, logger=None):
+    if logger is not None:
+        logger.info('Downloading dataset for inference from %s to %s' % (dataset_url, dataset_dir))
+    mx.test_utils.download(dataset_url, dataset_dir)
+
+
+def score(symblock, data, ctx, max_num_examples, skip_num_batches, logger=None):
+    metrics = [gluon.metric.create('acc')]#,
+            #    gluon.metric.create('top_k_accuracy', top_k=5)]
+
+    # make sure that fp32 inference works on the same images as calibrated quantized model
+    logger.info('Skipping the first %d batches' % skip_num_batches)
+
+    tic = time.time()
+    num = 0
+    for i, input_data in enumerate(data):
+        if i < skip_num_batches:
+            continue
+        x = input_data[0].as_in_context(ctx)
+        label = input_data[1].as_in_context(ctx)
+        outputs = symblock.forward(x)
+        for m in metrics:
+            m.update(label, outputs)
+        num += batch_size
+        if max_num_examples is not None and num >= max_num_examples:
+            break
+
+    speed = num / (time.time() - tic)
+
+    if logger is not None:
+        logger.info('Finished inference with %d images' % num)
+        logger.info('Finished with %f images per second', speed)
+        for m in metrics:
+            logger.info(m.get())
+
+def initialize_block_params(block, initializer):
+    for _, param in block.collect_params('.*gamma|.*moving_var|.*running_var').items():
+        param.initialize(mx.init.Constant(1))
+    for _, param in block.collect_params('.*beta|.*moving_mean|.*running_mean|.*bias').items():
+        param.initialize(mx.init.Constant(0))
+    for _, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def benchmark_score(symblock, ctx, batch_size, warmup_batches, num_batches, data_layer_type):
+    if data_layer_type == "int8":
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.int8)
+    elif data_layer_type == 'uint8':
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.uint8)
+    else:  # float32
+        dshape = mx.io.DataDesc(name='data', shape=(
+            batch_size,) + data_shape, dtype=np.float32)
+
+    # get data
+    if data_layer_type == "float32":
+        data = [mx.random.uniform(-1.0, 1.0, shape=shape, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+    else:
+        data = [mx.nd.full(shape=shape, val=127, ctx=ctx, dtype=data_layer_type)
+                for _, shape in [dshape]]
+
+    # run
+    for i in range(warmup_batches+num_batches):
+        if i == warmup_batches:
+            tic = time.time()
+        outputs = symblock.forward(*data)
+        for output in outputs:
+            output.wait_to_read()
+
+    # return num images per second
+    return num_batches * batch_size / (time.time() - tic)
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Score a model on a dataset')
+    parser.add_argument('--benchmark', type=bool, default=False, help='dummy data benchmark')
+    parser.add_argument('--symbol-file', type=str, required=True, help='symbol file path')
+    parser.add_argument('--param-file', type=str, required=False, help='param file path')
+    parser.add_argument('--batch-size', type=int, default=32)
+    parser.add_argument('--dataset', type=str, required=False, help='dataset path')
+    parser.add_argument('--rgb-mean', type=str, default='0,0,0')
+    parser.add_argument('--rgb-std', type=str, default='1,1,1')
+    parser.add_argument('--image-shape', type=str, default='3,224,224')
+    parser.add_argument('--data-nthreads', type=int, default=60, help='number of threads for data decoding')
+    parser.add_argument('--num-skipped-batches', type=int, default=0, help='skip the number of batches for inference')
+    parser.add_argument('--num-inference-batches', type=int, required=True, help='number of images used for inference')
+    parser.add_argument('--num-warmup-batches', type=int, default=5, help='number of warmup batches used for benchmark')
+    parser.add_argument('--shuffle-dataset', action='store_true', default=True,
+                        help='shuffle the score dataset')
+    parser.add_argument('--data-layer-type', type=str, default='float32',
+                        choices=['float32', 'int8', 'uint8'],
+                        help='data type for data layer (only with --benchmark)')
+
+    args = parser.parse_args()
+
+    ctx = mx.cpu(0)

Review comment:
       I restored ctx and added warning when using GPU.
   
   I don't think there are any limitations for bf16/fp32 - we just removed it to be consistent as example folder is called quantization. We can add bfloat examples to automatic-mixed-precision folder in other PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742450191


   Jenkins CI successfully triggered : [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

ciyongch commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536526571



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.

Review comment:
       That's more than what I mean, but it's good :) I was referring to only change the name from `MKL-DNN` to `oneDNN` in the document file or the relative description of the source code file at this moment :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-735750971


   @mxnet-bot run ci [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-739063455


   Jenkins CI successfully triggered : [windows-gpu, edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536238061



##########
File path: src/operator/subgraph/mkldnn/mkldnn_conv.cc
##########
@@ -137,17 +147,22 @@ void SgMKLDNNConvOperator::Forward(const OpContext &ctx,
   auto in_beta = mkldnn_param.with_bn ? (idx++) : 0;
   auto in_mean = mkldnn_param.with_bn ? (idx++) : 0;
   auto in_var = mkldnn_param.with_bn ? (idx++) : 0;
-  auto in_sum = mkldnn_param.with_sum ? (idx++) : 0;
+  auto in_sum = mkldnn_param.with_sum ? (mkldnn_param.dedup_sum? in_data : idx++) : -1;
   float data_min =
       mkldnn_param.quantized ? inputs[idx++].data().dptr<float>()[0] : 0.0;
   float data_max =
       mkldnn_param.quantized ? inputs[idx++].data().dptr<float>()[0] : 0.0;
-  float sum_min = (mkldnn_param.with_sum && mkldnn_param.quantized)
-                      ? inputs[idx++].data().dptr<float>()[0]
-                      : 0.0;
-  float sum_max = (mkldnn_param.with_sum && mkldnn_param.quantized)
-                      ? inputs[idx++].data().dptr<float>()[0]
-                      : 0.0;
+  float sum_min = 0.0f;

Review comment:
       Subgraph API provide deduplication of inputs to subgraph (e.g. residual block in mobilenet)
   Test for this is in tests/python/mkl/test_subgraph.py::test_dedup




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742513211


   @Szha is something wrong with unix-gpu CI? I got following error two times in a row:
   - [2020-12-10T12:17:34.769Z] Cannot contact mxnetlinux-cpu_cwcmun1huz: java.lang.InterruptedException
   ...
   [2020-12-10T12:28:54.084Z] FAILED: CMakeFiles/mxnet.dir/src/operator/numpy/np_kron.cc.o 
   ...
   [2020-12-10T12:28:54.084Z] c++: fatal error: Killed signal terminated program cc1plus
   
   but in run before in GPU: MKLDNN job it was CMakeFiles/mxnet.dir/src/operator/numpy/np_elemwise_broadcast_logic_op.cc.o 
   
   I don't think it's releated to my change 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539282606



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       I've added verified models with tested accuracy




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742450130


   @mxnet-bot run ci [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742513309


   @mxnet-bot run ci [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742480448


   @mxnet-bot run ci [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742029553


   Jenkins CI successfully triggered : [unix-cpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel merged pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

pengzhao-intel merged pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536243581



##########
File path: tests/python/quantization/test_quantization.py
##########
@@ -0,0 +1,1210 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Some of the tests using CUDNN require a special GPU instruction called dp4a.
+Ref: http://images.nvidia.com/content/pdf/tesla/184457-Tesla-P4-Datasheet-NV-Final-Letter-Web.pdf
+"""
+import os
+import mxnet as mx
+import numpy as np
+from mxnet.gluon.model_zoo import vision
+from mxnet.test_utils import assert_almost_equal, assert_exception, rand_ndarray, rand_shape_nd, same, DummyIter
+from common import xfail_when_nonstandard_decimal_separator
+from mxnet.io import NDArrayIter
+import unittest
+import operator
+
+
+def initialize_block_params(block, initializer):
+    for name, param in block.collect_params('.*gamma|.*running_var|.*moving_var').items():
+        param.initialize(mx.init.Constant(1))
+    for name, param in block.collect_params('.*beta|.*bias|.*moving_mean|.*running_mean').items():
+        param.initialize(mx.init.Constant(0))
+    for name, param in block.collect_params('.*weight').items():
+        param.initialize(initializer)
+
+def collect_block_args_aux(block, sym):
+  arg_params, aux_params = dict(), dict()
+  for k, v in block.collect_params().items():
+    if k in sym.list_arguments():
+      arg_params[k]= v._reduce()
+    elif k in sym.list_auxiliary_states():
+      aux_params[k]= v._reduce()
+  return arg_params, aux_params
+
+def is_test_for_gpu():
+    return mx.current_context().device_type == 'gpu'
+
+
+def is_test_for_mkldnn():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == '1')
+
+
+def is_test_for_native_cpu():
+    return (mx.current_context().device_type == 'cpu'
+            and os.environ.get('ENABLE_MKLDNN_QUANTIZATION_TEST') == None)
+
+
+def test_quantize_float32_to_int8():
+    shape = rand_shape_nd(4)
+    data = rand_ndarray(shape, 'default', dtype='float32')
+    min_range = mx.nd.min(data)
+    max_range = mx.nd.max(data)
+    qdata, min_val, max_val = mx.nd.contrib.quantize(data, min_range, max_range, out_type='int8')
+    data_np = data.asnumpy()
+    min_range = min_range.asscalar()
+    max_range = max_range.asscalar()
+    real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+    quantized_range = 127.0
+    scale = quantized_range / real_range
+    assert qdata.dtype == np.int8
+    assert min_val.dtype == np.float32
+    assert max_val.dtype == np.float32
+    assert same(min_val.asscalar(), -real_range)
+    assert same(max_val.asscalar(), real_range)
+    qdata_np = (np.sign(data_np) * np.minimum(np.abs(data_np) * scale + 0.5, quantized_range)).astype(np.int8)
+    assert_almost_equal(qdata.asnumpy(), qdata_np, atol = 1)
+
+
+def test_dequantize_int8_to_float32():
+
+    def get_test_data(real_range, qdata_np):
+        qdata = mx.nd.array(qdata_np, dtype=np.int8)
+        min_range = mx.nd.array([-real_range], dtype=np.float32)
+        max_range = mx.nd.array([real_range], dtype=np.float32)
+        return qdata, min_range, max_range
+
+    def baseline_dequantization(qdata, real_range, qdata_np):
+        quantized_range = 127.0
+        scale = real_range / quantized_range
+        data_np = qdata_np * scale
+        return data_np
+
+    def test_nd_array_dequantization(qdata, min_range, max_range, expected_result):
+        data = mx.nd.contrib.dequantize(qdata, min_range, max_range, out_type='float32')
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    def test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result):
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        dequant = mx.sym.contrib.dequantize(sym_data, sym_min_range,
+                                            sym_max_range, out_type='float32')
+        out = dequant._bind(ctx=mx.current_context(),
+                           args={'data':qdata, 'min_range':min_range, 'max_range':max_range})
+        data = out.forward()[0]
+        assert data.dtype == np.float32
+        assert_almost_equal(data.asnumpy(), expected_result, atol = 1)
+
+    real_range = 128
+    shape = rand_shape_nd(4)
+    qdata_np = np.random.uniform(low=-127, high=127, size=shape).astype(dtype=np.int8)
+    qdata, min_range, max_range = get_test_data(real_range, qdata_np)
+    expected_result = baseline_dequantization(qdata, real_range, qdata_np)
+    # test nd array implementation.
+    test_nd_array_dequantization(qdata, min_range, max_range, expected_result)
+    # test symbolic api implementaion.
+    test_symbolic_api_dequantization(qdata, min_range, max_range, expected_result)
+
+
+def test_requantize_int32_to_int8():
+    def quantized_int32_to_float(qdata, min_range, max_range):
+        assert qdata.dtype == 'int32'
+        quantized_range = np.iinfo('int32').max
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        scale = float(real_range) / float(quantized_range)
+        return qdata.astype('float32') * scale
+
+    def float_to_quantized_int8(data, min_range, max_range):
+        assert data.dtype == 'float32'
+        real_range = np.maximum(np.abs(min_range), np.abs(max_range))
+        quantized_range = np.iinfo('int8').max
+        scale = float(quantized_range) / float(real_range)
+        return (np.sign(data) * np.minimum(np.abs(data) * scale + 0.5, quantized_range)).astype('int8')
+
+    def requantize(qdata, min_data, max_data, real_range):
+        data = quantized_int32_to_float(qdata, min_data, max_data)
+        output = float_to_quantized_int8(data, -real_range, real_range)
+        return output, -real_range, real_range
+
+    def requantize_baseline(qdata, min_data, max_data, min_calib_range=None, max_calib_range=None):
+        if min_calib_range is not None and max_calib_range is not None:
+            real_range = np.maximum(np.abs(min_calib_range), np.abs(max_calib_range))
+            return requantize(qdata, min_data, max_data, real_range)
+        else:
+            min_range = quantized_int32_to_float(np.min(qdata), min_data, max_data)
+            max_range = quantized_int32_to_float(np.max(qdata), min_data, max_data)
+            return requantize(qdata, min_data, max_data, np.maximum(np.abs(min_range), np.abs(max_range)))
+
+    def check_requantize(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        if min_calib_range is None or max_calib_range is None:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range)
+        else:
+            qdata_int8, min_output, max_output = mx.nd.contrib.requantize(qdata, min_range, max_range,
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    def check_requantize_with_symbol(shape, min_calib_range=None, max_calib_range=None):
+        qdata = mx.nd.random.uniform(low=-1000.0, high=1000.0, shape=shape).astype('int32')
+        min_range = mx.nd.array([-1010.0])
+        max_range = mx.nd.array([1020.0])
+        sym_data = mx.sym.Variable('data')
+        sym_min_range = mx.sym.Variable('min_range')
+        sym_max_range = mx.sym.Variable('max_range')
+        if min_calib_range is None or max_calib_range is None:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range)
+            out = requant._bind(ctx=mx.current_context(),
+                               args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+        else:
+            requant = mx.sym.contrib.requantize(sym_data, sym_min_range, sym_max_range,
+                                                min_calib_range=min_calib_range,
+                                                max_calib_range=max_calib_range)
+            out = requant._bind(ctx=mx.current_context(), args={'data':qdata, 'min_range':min_range,
+                               'max_range':max_range})
+            qdata_int8, min_output, max_output = out.forward()
+
+        qdata_int8_np, min_output_np, max_output_np = requantize_baseline(qdata.asnumpy(), min_range.asscalar(),
+                                                                          max_range.asscalar(),
+                                                                          min_calib_range=min_calib_range,
+                                                                          max_calib_range=max_calib_range)
+        assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np, atol = 1)
+        assert_almost_equal(min_output.asnumpy(), np.array([min_output_np]))
+        assert_almost_equal(max_output.asnumpy(), np.array([max_output_np]))
+
+    # test with symbol API.
+    check_requantize_with_symbol((3, 4, 10, 10))
+    check_requantize_with_symbol((32, 3, 23, 23))
+    check_requantize_with_symbol((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize_with_symbol((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+    # Test with nd array API
+    check_requantize((3, 4, 10, 10))
+    check_requantize((32, 3, 23, 23))
+    check_requantize((3, 4, 10, 10), min_calib_range=-1050.0, max_calib_range=1040.0)
+    check_requantize((32, 3, 23, 23), min_calib_range=-134.349, max_calib_range=523.43)
+
+
+def test_quantized_conv():
+    def check_quantized_conv(data_shape, kernel, num_filter, pad, stride, dilate, no_bias, qdtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_conv for native cpu since it is not supported yet')
+            return
+        elif is_test_for_mkldnn():
+            # (TODO)Xinyu: https://github.com/apache/incubator-mxnet/issues/16830
+            print('skipped testing quantized_conv for mkldnn cpu since it is a flaky case')
+            return
+        elif qdtype == 'uint8' and is_test_for_gpu():
+            print('skipped testing quantized_conv for gpu uint8 since it is not supported yet')
+            return
+        elif is_test_for_gpu() and len(data_shape) != 4:
+            print('skipped testing quantized_conv for gpu 5d layout since it is not supported yet')
+            return
+
+        # run fp32 conv
+        data = mx.sym.Variable(name='data', shape=data_shape, dtype='float32')
+        conv = mx.sym.Convolution(data=data, kernel=kernel, num_filter=num_filter, pad=pad, stride=stride,
+                                  dilate=dilate, no_bias=no_bias, cudnn_off=False, name='conv')
+        arg_shapes, _, _ = conv.infer_shape(data=data_shape)
+        arg_names = conv.list_arguments()
+        conv_exe_fp32 = conv._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qdtype == 'uint8':
+            data_low = 0.0
+            data_high = 127.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+        conv_exe_fp32.arg_dict[arg_names[0]][:] = mx.nd.random.uniform(low=data_low, high=data_high,
+                                                                       shape=data_shape).astype('int32')
+        conv_exe_fp32.arg_dict[arg_names[1]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                       shape=arg_shapes[1]).astype('int32')
+        if not no_bias:
+            conv_exe_fp32.arg_dict[arg_names[2]][:] = mx.nd.random.uniform(low=-127.0, high=127.0,
+                                                                           shape=arg_shapes[2]).astype('int32')
+        output = conv_exe_fp32.forward()[0]
+
+        # run quantized conv
+        qdata = mx.sym.Variable(name='qdata', shape=data_shape, dtype=qdtype)
+        qweight = mx.sym.Variable(name='qweight', dtype='int8')
+        min_data = mx.sym.Variable(name='min_data')
+        max_data = mx.sym.Variable(name='max_data')
+        min_weight = mx.sym.Variable(name='min_weight')
+        max_weight = mx.sym.Variable(name='max_weight')
+        quantized_conv = mx.sym.contrib.quantized_conv(data=qdata, weight=qweight, min_data=min_data,
+                                                       max_data=max_data, min_weight=min_weight,
+                                                       max_weight=max_weight, kernel=kernel,
+                                                       num_filter=num_filter, pad=pad, stride=stride,
+                                                       dilate=dilate, no_bias=no_bias)
+        qarg_names = quantized_conv.list_arguments()
+        type_dict = None
+        if not no_bias:
+            type_dict = {qarg_names[2]: 'int8'}
+        conv_exe_int8 = quantized_conv._simple_bind(ctx=mx.current_context(), type_dict=type_dict, grad_req='null')
+        conv_exe_int8.arg_dict[qarg_names[0]][:] = conv_exe_fp32.arg_dict[arg_names[0]].astype(qdtype)
+        conv_exe_int8.arg_dict[qarg_names[1]][:] = conv_exe_fp32.arg_dict[arg_names[1]].astype('int8')
+        quantized_range = 127.0
+        if no_bias:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = quantized_range
+        else:
+            conv_exe_int8.arg_dict[qarg_names[2]][:] = conv_exe_fp32.arg_dict[arg_names[2]].astype('int8')
+            conv_exe_int8.arg_dict[qarg_names[3]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[4]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[5]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[6]][:] = quantized_range
+            conv_exe_int8.arg_dict[qarg_names[7]][:] = -quantized_range
+            conv_exe_int8.arg_dict[qarg_names[8]][:] = quantized_range
+        qoutput, min_range, max_range = conv_exe_int8.forward()
+
+        if no_bias:
+            assert_almost_equal(output.asnumpy(), qoutput.asnumpy(), atol = 1)
+        else:
+            # with adding bias, accuracy loss should not be greater than one
+            diff = mx.nd.abs(output - qoutput.astype(output.dtype))
+            cond = mx.nd.lesser(2, diff).sum().asscalar()
+            assert cond == 0
+
+    for qdtype in ['int8', 'uint8']:
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), True, qdtype)
+        check_quantized_conv((3, 4, 28, 28), (3, 3), 128, (1, 1), (1, 1), (1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (1, 1, 1), True, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), False, qdtype)
+        check_quantized_conv((1, 3, 4, 28, 28), (1, 3, 3), 128, (1, 1, 1), (1, 1, 1), (2, 2, 2), True, qdtype)
+
+
+def test_quantized_elemwise_add():
+    def check_quantized_elemwise_add(data_shape, qtype):
+        if is_test_for_native_cpu():
+            print('skipped testing quantized_elemwise_add for native cpu since it is not supported yet')
+            return
+        elif qtype != 'uint8' and qtype != 'int8':
+            print('skipped testing quantized_elemwise_add for not supported data type')
+            return
+        elif is_test_for_gpu():
+            print('skipped testing quantized_elemwise_add for gpu since it is not supported yet')
+            return
+
+        dataA = mx.sym.Variable(name='dataA', shape=data_shape, dtype='float32')
+        dataB = mx.sym.Variable(name='dataB', shape=data_shape, dtype='float32')
+        elemwise_add_fp32 = mx.sym.elemwise_add(dataA, dataB)
+        arg_names = elemwise_add_fp32.list_arguments()
+        elemwise_add_fp32_exe = elemwise_add_fp32._simple_bind(ctx=mx.current_context(), grad_req='null')
+        if qtype == 'uint8':
+            data_low = 0.0
+            data_high = 255.0
+        else:
+            data_low = -127.0
+            data_high = 127.0
+
+        dataA_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        dataB_val = mx.nd.random.uniform(low=data_low, high=data_high, shape=data_shape).astype('int32')
+        elemwise_add_fp32_exe.arg_dict[arg_names[0]][:] = dataA_val
+
+        elemwise_add_fp32_exe.arg_dict[arg_names[1]][:] = dataB_val
+
+        output = elemwise_add_fp32_exe.forward()[0]
+        print(output)
+        qdataA = mx.sym.Variable(name='qdataA', shape=data_shape, dtype=qtype)
+        qdataB = mx.sym.Variable(name='qdataB', shape=data_shape, dtype=qtype)
+        min_dataA = mx.sym.Variable(name='min_dataA', dtype='float32')
+        max_dataA = mx.sym.Variable(name='max_dataA', dtype='float32')
+        min_dataB = mx.sym.Variable(name='min_dataB', dtype='float32')
+        max_dataB = mx.sym.Variable(name='max_dataB', dtype='float32')
+        quantized_elemwise_add = mx.sym.contrib.quantized_elemwise_add(qdataA, qdataB, min_dataA, max_dataA, min_dataB, max_dataB)
+        elemwise_add_int8_exe = quantized_elemwise_add._simple_bind(ctx=mx.current_context(), grad_req='null')
+        qarg_names = quantized_elemwise_add.list_arguments()
+        elemwise_add_int8_exe.arg_dict[qarg_names[0]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[0]].astype(qtype)
+        elemwise_add_int8_exe.arg_dict[qarg_names[1]][:] = elemwise_add_fp32_exe.arg_dict[arg_names[1]].astype(qtype)
+        quantized_range = 127.0
+        elemwise_add_int8_exe.arg_dict[qarg_names[2]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[3]][:] = data_high
+        elemwise_add_int8_exe.arg_dict[qarg_names[4]][:] = data_low
+        elemwise_add_int8_exe.arg_dict[qarg_names[5]][:] = data_high
+        qoutput, min_range, max_range = elemwise_add_int8_exe.forward()
+        print(qoutput)
+        int8_rslt = qoutput.astype(output.dtype)*max_range/0x7fffffff
+        print(int8_rslt)
+        diff = mx.nd.abs(output - int8_rslt)
+        cond = mx.nd.lesser(2, diff).sum().asscalar()
+        assert cond == 0
+
+    for qtype in ['int8', 'uint8']:
+        check_quantized_elemwise_add((4, 6), qtype)
+        # check_quantized_elemwise_add((13, 74, 52), qtype)

Review comment:
       Not enabled after debugging - thank you and fixed :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539946173



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)
+<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+
+Intel® oneDNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with oneDNN backend](#TODO(agrygielski)).

Review comment:
       Removed - we will add link to article once it will be published




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539946597



##########
File path: example/quantization/imagenet_gen_qsym_onednn.py
##########
@@ -0,0 +1,274 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import argparse
+import logging
+import os
+import re
+import sys
+from inspect import currentframe, getframeinfo
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.contrib.quantization import quantize_net
+from mxnet.gluon.data import DataLoader
+from mxnet.gluon.data.vision import transforms
+from mxnet.gluon.model_zoo.vision import get_model
+
+sys.path.append('../..')
+from tools.rec2idx import IndexCreator
+
+
+def download_calib_dataset(dataset_url, calib_dataset, logger=None):
+    if logger is not None:
+        logger.info('Downloading calibration dataset from %s to %s' % (dataset_url, calib_dataset))
+    mx.test_utils.download(dataset_url, calib_dataset)
+
+def get_from_gluon(model_name, classes=1000, logger=None):
+    dir_path = os.path.dirname(os.path.realpath(__file__))
+    model_path = os.path.join(dir_path, 'model')
+    if logger is not None:
+        logger.info('Converting model from Gluon-CV ModelZoo %s... into path %s' % (model_name, model_path))
+    net = get_model(name=model_name, classes=classes, pretrained=True)
+    prefix = os.path.join(model_path, model_name)
+    return net, prefix
+
+def regex_find_excluded_symbols(patterns_dict, model_name):
+    for key, value in patterns_dict.items():
+        if re.search(key, model_name) is not None:
+            return value
+    return None
+
+def get_exclude_symbols(model_name, exclude_first_conv):
+    # Grouped supported models at the time of commit:
+    # alexnet
+    # densenet121, densenet161
+    # densenet169, densenet201
+    # inceptionv3
+    # mobilenet0.25, mobilenet0.5, mobilenet0.75, mobilenet1.0,
+    # mobilenetv2_0.25, mobilenetv2_0.5, mobilenetv2_0.75, mobilenetv2_1.0
+    # resnet101_v1, resnet152_v1, resnet18_v1, resnet34_v1, resnet50_v1
+    # resnet101_v2, resnet152_v2, resnet18_v2, resnet34_v2, resnet50_v2
+    # squeezenet1.0, squeezenet1.1
+    # vgg11, vgg11_bn, vgg13, vgg13_bn, vgg16, vgg16_bn, vgg19, vgg19_bn
+    exclude_symbol_regex = {
+        'mobilenet[^v]': ['mobilenet_hybridsequential0_flatten0_flatten0', 'mobilenet_hybridsequential0_globalavgpool2d0_fwd'],
+        'mobilenetv2': ['mobilenetv2_hybridsequential1_flatten0_flatten0'],
+        # resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd is excluded for the sake of accuracy
+        'resnet.*v2': ['resnetv2_hybridsequential0_flatten0_flatten0', 'resnetv2_hybridsequential0_hybridsequential0_bottleneckv20_batchnorm0_fwd'],
+        'squeezenet1': ['squeezenet_hybridsequential1_flatten0_flatten0'],
+    }
+    excluded_sym_names = regex_find_excluded_symbols(exclude_symbol_regex, model_name)
+    if excluded_sym_names is None:
+        excluded_sym_names = []
+    if exclude_first_conv:
+        first_conv_regex = {
+            'alexnet': ['alexnet_hybridsequential0_conv2d0_fwd'],
+            'densenet': ['densenet_hybridsequential0_conv2d0_fwd'],
+            'inceptionv3': ['inception3_hybridsequential0_hybridsequential0_conv2d0_fwd'],
+            'mobilenet[^v]': ['mobilenet_hybridsequential0_conv2d0_fwd'],
+            'mobilenetv2': ['mobilenetv2_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v1': ['resnetv1_hybridsequential0_conv2d0_fwd'],
+            'resnet.*v2': ['resnetv2_hybridsequential0_conv2d0_fwd'],
+            'squeezenet1': ['squeezenet_hybridsequential0_conv2d0_fwd'],
+            'vgg': ['vgg_hybridsequential0_conv2d0_fwd'],
+        }
+        excluded_first_conv_sym_names = regex_find_excluded_symbols(first_conv_regex, model_name)
+        if excluded_first_conv_sym_names is None:
+            raise ValueError('Currently, model %s is not supported in this script' % model_name)
+        excluded_sym_names += excluded_first_conv_sym_names
+    return excluded_sym_names
+
+
+

Review comment:
       Done

##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)

Review comment:
       Right, done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-734875940


   Jenkins CI successfully triggered : [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-733950316


   Hey @bgawrych , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [sanity, website, unix-cpu, clang, unix-gpu, centos-gpu, windows-cpu, miscellaneous, centos-cpu, edge, windows-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

bgawrych commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742029500


   @mxnet-bot run ci [unix-cpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-735751022


   Jenkins CI successfully triggered : [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] ciyongch commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

ciyongch commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536526830



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU Platform and can bring performance improvements on the [Intel® Xeon® Scalable Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html). To apply quantization flow to your project directly, please refer [Optimize custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] [--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is `data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to launch performance benchmark for float32 or int8 image-classification models with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s ./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       Ok, so can we add the model list which are already supported and verified?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-743700693


   It's great the CI passed. 
   
   Thanks the great efforts from the team to re-enable quantization flow in the MXNet 2.0: )
   
   I am going to merge this PR. @anko-intel @bgawrych


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] xinyu-intel commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

xinyu-intel commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r539849585



##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,194 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® oneAPI Deep Neural Network Library (oneDNN) to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® oneDNN](#1)

Review comment:
       since cudnn quantization will not be re-enabled and introduced in this README, so we can just remove this contents.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #19587: [FEATURE] Restore Quantization API to MXNet

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#issuecomment-742480485


   Jenkins CI successfully triggered : [unix-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org