You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/12/09 10:38:09 UTC

[GitHub] [incubator-mxnet] chinakook opened a new issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

chinakook opened a new issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649


   I built mxnet with cuda 11.1 by myself, but I found there are significant difference between the result of RTX 2080Ti and the one of RTX 3090.
   I have tested the results with resnet18_v1(modified to same with torchvision), the results are following:
   MXNet 2.0 on RTX 3090 result:
   ```
      1.41843593e+00 -6.14944875e-01 -1.21827471e+00  1.47419822e+00
      1.08697571e-01 -1.53987074e+00 -2.19901204e-01  9.48053539e-01
      9.75863874e-01  1.70030773e+00  8.14817071e-01 -1.23302710e+00
      1.59906292e+00  6.93061709e-01 -1.53004932e+00 -1.63886517e-01
     -7.90785626e-02  2.69093782e-01 -6.79612219e-01  1.62834823e-01
      1.30419743e+00  3.55334133e-01  3.44635278e-01 -1.63632333e+00
     -1.83135128e+00 -2.71486902e+00 -1.90834343e+00 -1.56557214e+00
     -2.34904575e+00 -8.75294745e-01 -1.45051964e-02  2.31601214e+00]]
   ```
   
   MXNet 2.0 on RTX 2080Ti:
   ```
      1.41812360e+00 -6.14904046e-01 -1.21819317e+00  1.47481441e+00
      1.08835481e-01 -1.53912461e+00 -2.19649583e-01  9.48446751e-01
      9.76122081e-01  1.70034432e+00  8.15561593e-01 -1.23293436e+00
      1.59933698e+00  6.92907929e-01 -1.53025842e+00 -1.63300186e-01
     -7.87981078e-02  2.69501388e-01 -6.79563940e-01  1.62799448e-01
      1.30361092e+00  3.54955167e-01  3.44287097e-01 -1.63627052e+00
     -1.83101940e+00 -2.71485949e+00 -1.90862203e+00 -1.56534243e+00
     -2.34861779e+00 -8.75208437e-01 -1.46625079e-02  2.31575775e+00]]
   ```
   
   MXNet 2.0 on CPU:
   ```
      1.41812253e+00 -6.14904225e-01 -1.21819282e+00  1.47481489e+00
      1.08835749e-01 -1.53912461e+00 -2.19649076e-01  9.48446691e-01
      9.76122200e-01  1.70034420e+00  8.15561354e-01 -1.23293459e+00
      1.59933639e+00  6.92908108e-01 -1.53025806e+00 -1.63299382e-01
     -7.87984356e-02  2.69500166e-01 -6.79564059e-01  1.62798852e-01
      1.30361056e+00  3.54956239e-01  3.44287276e-01 -1.63627028e+00
     -1.83101881e+00 -2.71485925e+00 -1.90862203e+00 -1.56534243e+00
     -2.34861803e+00 -8.75208795e-01 -1.46625564e-02  2.31575823e+00]]
   ```
   
   Torch 1.7 on RTX 3090
   ```
      1.41812313e+00 -6.14903867e-01 -1.21819305e+00  1.47481418e+00
      1.08835526e-01 -1.53912425e+00 -2.19649911e-01  9.48446572e-01
      9.76122499e-01  1.70034397e+00  8.15561354e-01 -1.23293447e+00
      1.59933650e+00  6.92907453e-01 -1.53025746e+00 -1.63299173e-01
     -7.87977725e-02  2.69501239e-01 -6.79563761e-01  1.62798911e-01
      1.30361116e+00  3.54956120e-01  3.44288558e-01 -1.63627124e+00
     -1.83101881e+00 -2.71485972e+00 -1.90862191e+00 -1.56534243e+00
     -2.34861827e+00 -8.75208020e-01 -1.46627314e-02  2.31575871e+00]]
   ```
   
   Torch 1.7 on RTX 2080Ti
   ```
      1.41812313e+00 -6.14903808e-01 -1.21819329e+00  1.47481418e+00
      1.08835645e-01 -1.53912401e+00 -2.19649911e-01  9.48446631e-01
      9.76122320e-01  1.70034397e+00  8.15561354e-01 -1.23293436e+00
      1.59933674e+00  6.92907691e-01 -1.53025758e+00 -1.63299263e-01
     -7.87977204e-02  2.69501120e-01 -6.79563642e-01  1.62798882e-01
      1.30361140e+00  3.54956120e-01  3.44288498e-01 -1.63627124e+00
     -1.83101892e+00 -2.71485972e+00 -1.90862191e+00 -1.56534243e+00
     -2.34861827e+00 -8.75208080e-01 -1.46627687e-02  2.31575847e+00]]
   ```
   
   Torch 1.7 on CPU
   ```
      1.41812289e+00 -6.14903927e-01 -1.21819293e+00  1.47481537e+00
      1.08835876e-01 -1.53912544e+00 -2.19649374e-01  9.48445439e-01
      9.76121962e-01  1.70034182e+00  8.15560222e-01 -1.23293531e+00
      1.59933591e+00  6.92908287e-01 -1.53025782e+00 -1.63299412e-01
     -7.87980631e-02  2.69500971e-01 -6.79563344e-01  1.62798971e-01
      1.30361187e+00  3.54957968e-01  3.44287753e-01 -1.63626969e+00
     -1.83101833e+00 -2.71485972e+00 -1.90862167e+00 -1.56534195e+00
     -2.34861779e+00 -8.75208795e-01 -1.46625713e-02  2.31575823e+00]]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751432013


   After more tests, I found that the result also varies on RTX2080Ti on both MXNet 1.9.0 and MXNet 2.0.0.
   ~~The result have 0.005 difference in the shallow layer. I think it will have more difference as the layer grows.~~
   ```python
   import os
   # os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   import mxnet as mx
   import numpy as np
   from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   
   def testrestnet():
       ctx = mx.gpu(0)
       mx_model = resnet18_v1(pretrained=True,ctx=ctx)
       mx_model.hybridize()
   
       x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
   
       y_mx = mx_model.features[0:6](x_mx)
   
       # the res is always 13064.977 on CPU
       # the res varies on RTX2080Ti/RTX3090 on both MXNet 1.9.0 and 2.0.0 without 
       # MXNET_CUDNN_AUTOTUNE_DEFAULT=0: 13064.971, 13064.976
       res = y_mx.asnumpy().sum()
   
       print(res)
   
   if __name__ == '__main__':
       testrestnet()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758663463


   @Neutron3529 Yes, v1.x is OK. MXNet 2 has this bug. I'll do more tests further.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751424572


   The result also varies in mxnet_cu110-2.0.0b20201226 
   
   Result 1 on RTX 3090 GPU on mxnet_cu110-2.0.0b20201226
   ```
      1.41821623e+00 -6.14694595e-01 -1.21822190e+00  1.47472918e+00
      1.08678900e-01 -1.53905892e+00 -2.19664723e-01  9.48607504e-01
      9.76179004e-01  1.70066428e+00  8.15666854e-01 -1.23275781e+00
      1.59943473e+00  6.92619503e-01 -1.52998209e+00 -1.63329318e-01
     -7.86948949e-02  2.69214898e-01 -6.79625511e-01  1.63082540e-01
      1.30359614e+00  3.54878873e-01  3.44506621e-01 -1.63622832e+00
     -1.83121693e+00 -2.71499276e+00 -1.90867770e+00 -1.56530845e+00
     -2.34865284e+00 -8.75126600e-01 -1.44264027e-02  2.31574321e+00
   
   ```
   
   Result 2 on RTX 3090 GPU on mxnet_cu110-2.0.0b20201226
   ```
      1.41812336e+00 -6.14903927e-01 -1.21819293e+00  1.47481430e+00
      1.08835243e-01 -1.53912401e+00 -2.19649285e-01  9.48447049e-01
      9.76122022e-01  1.70034528e+00  8.15561593e-01 -1.23293483e+00
      1.59933603e+00  6.92907691e-01 -1.53025889e+00 -1.63300052e-01
     -7.87986293e-02  2.69500673e-01 -6.79565012e-01  1.62798882e-01
      1.30361140e+00  3.54955018e-01  3.44288290e-01 -1.63627052e+00
     -1.83101904e+00 -2.71485925e+00 -1.90862215e+00 -1.56534243e+00
     -2.34861803e+00 -8.75208020e-01 -1.46629252e-02  2.31575775e+00
   ```
   
   Test script for mxnet 2.0.0 master, use resent18_v1 to test between the results on cpu and gpu to address the problem. Sometimes It will get the same results, so you need to run many times to find the different results.
   ```python
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   
   # coding: utf-8
   # pylint: disable= arguments-differ,unused-argument,missing-docstring,too-many-lines
   """ResNets, implemented in Gluon."""
   from __future__ import division
   import gluoncv as gcv
   
   __all__ = ['ResNetV1', 'ResNetV2',
              'BasicBlockV1', 'BasicBlockV2',
              'BottleneckV1', 'BottleneckV2',
              'resnet18_v1', 'resnet34_v1', 'resnet50_v1', 'resnet101_v1', 'resnet152_v1',
              'resnet18_v2', 'resnet34_v2', 'resnet50_v2', 'resnet101_v2', 'resnet152_v2',
              'se_resnet18_v1', 'se_resnet34_v1', 'se_resnet50_v1',
              'se_resnet101_v1', 'se_resnet152_v1',
              'se_resnet18_v2', 'se_resnet34_v2', 'se_resnet50_v2',
              'se_resnet101_v2', 'se_resnet152_v2',
              'get_resnet']
   
   from mxnet.context import cpu
   from mxnet.gluon.block import HybridBlock
   from mxnet.gluon import nn
   from mxnet.gluon.nn import BatchNorm
   from mxnet import base
   from mxnet.util import is_np_array
   
   # Helpers
   def _conv3x3(channels, stride, in_channels):
       return nn.Conv2D(channels, kernel_size=3, strides=stride, padding=1,
                        use_bias=False, in_channels=in_channels)
   
   
   # Blocks
   class BasicBlockV1(HybridBlock):
       r"""BasicBlock V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(_conv3x3(channels, stride, in_channels))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels, 1, channels))
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(residual+x, act_type='relu')
   
           return x
   
   
   class BottleneckV1(HybridBlock):
       r"""Bottleneck V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels//4, stride, channels//4))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x + residual, act_type='relu')
           return x
   
   
   class BasicBlockV2(HybridBlock):
       r"""BasicBlock V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False,
                    norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = _conv3x3(channels, stride, in_channels)
           if not last_gamma:
               self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn2 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels, 1, channels)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   class BottleneckV2(HybridBlock):
       r"""Bottleneck V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False)
           self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels//4, stride, channels//4)
           if not last_gamma:
               self.bn3 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn3 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv3 = nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           x = self.bn3(x)
           x = act(x, act_type='relu')
           x = self.conv3(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   # Nets
   class ResNetV1(HybridBlock):
       r"""ResNet V1 model from
       `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV1, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=channels[i],
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           self.features.add(nn.GlobalAvgPool2D())
   
           self.output = nn.Dense(classes, in_units=channels[-1])
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
   
           return x
   
   
   class ResNetV2(HybridBlock):
       r"""ResNet V2 model from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV2, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           self.features.add(norm_layer(scale=False, center=False,
                                        **({} if norm_kwargs is None else norm_kwargs)))
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           in_channels = channels[0]
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=in_channels,
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
               in_channels = channels[i+1]
           self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.features.add(nn.Activation('relu'))
           self.features.add(nn.GlobalAvgPool2D())
           self.features.add(nn.Flatten())
   
           self.output = nn.Dense(classes, in_units=in_channels)
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
           return x
   
   
   # Specification
   resnet_spec = {18: ('basic_block', [2, 2, 2, 2], [64, 64, 128, 256, 512]),
                  34: ('basic_block', [3, 4, 6, 3], [64, 64, 128, 256, 512]),
                  50: ('bottle_neck', [3, 4, 6, 3], [64, 256, 512, 1024, 2048]),
                  101: ('bottle_neck', [3, 4, 23, 3], [64, 256, 512, 1024, 2048]),
                  152: ('bottle_neck', [3, 8, 36, 3], [64, 256, 512, 1024, 2048])}
   
   resnet_net_versions = [ResNetV1, ResNetV2]
   resnet_block_versions = [{'basic_block': BasicBlockV1, 'bottle_neck': BottleneckV1},
                            {'basic_block': BasicBlockV2, 'bottle_neck': BottleneckV2}]
   
   
   # Constructor
   def get_resnet(version, num_layers, pretrained=False, ctx=cpu(),
                  root='~/.mxnet/models', use_se=False, **kwargs):
       r"""ResNet V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       ResNet V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       version : int
           Version of ResNet. Options are 1, 2.
       num_layers : int
           Numbers of layers. Options are 18, 34, 50, 101, 152.
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default $MXNET_HOME/models
           Location for keeping the model parameters.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       assert num_layers in resnet_spec, \
           "Invalid number of layers: %d. Options are %s"%(
               num_layers, str(resnet_spec.keys()))
       block_type, layers, channels = resnet_spec[num_layers]
       assert 1 <= version <= 2, \
           "Invalid resnet version: %d. Options are 1 and 2."%version
       resnet_class = resnet_net_versions[version-1]
       block_class = resnet_block_versions[version-1][block_type]
       net = resnet_class(block_class, layers, channels, use_se=use_se, **kwargs)
       if pretrained:
           
           from gluoncv.model_zoo.model_store import get_model_file
           if not use_se:
               net.load_parameters(get_model_file('resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           else:
               net.load_parameters(get_model_file('se_resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           from gluoncv.data import ImageNet1kAttr
           attrib = ImageNet1kAttr()
           net.synset = attrib.synset
           net.classes = attrib.classes
           net.classes_long = attrib.classes_long
       return net
   
   def resnet18_v1(**kwargs):
       r"""ResNet-18 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=False, **kwargs)
   
   def resnet34_v1(**kwargs):
       r"""ResNet-34 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=False, **kwargs)
   
   def resnet50_v1(**kwargs):
       r"""ResNet-50 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=False, **kwargs)
   
   def resnet101_v1(**kwargs):
       r"""ResNet-101 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=False, **kwargs)
   
   def resnet152_v1(**kwargs):
       r"""ResNet-152 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=False, **kwargs)
   
   def resnet18_v2(**kwargs):
       r"""ResNet-18 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=False, **kwargs)
   
   def resnet34_v2(**kwargs):
       r"""ResNet-34 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=False, **kwargs)
   
   def resnet50_v2(**kwargs):
       r"""ResNet-50 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=False, **kwargs)
   
   def resnet101_v2(**kwargs):
       r"""ResNet-101 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=False, **kwargs)
   
   def resnet152_v2(**kwargs):
       r"""ResNet-152 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=False, **kwargs)
   
   # SE-ResNet
   def se_resnet18_v1(**kwargs):
       r"""SE-ResNet-18 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=True, **kwargs)
   
   def se_resnet34_v1(**kwargs):
       r"""SE-ResNet-34 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=True, **kwargs)
   
   def se_resnet50_v1(**kwargs):
       r"""SE-ResNet-50 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=True, **kwargs)
   
   def se_resnet101_v1(**kwargs):
       r"""SE-ResNet-101 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=True, **kwargs)
   
   def se_resnet152_v1(**kwargs):
       r"""SE-ResNet-152 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=True, **kwargs)
   
   def se_resnet18_v2(**kwargs):
       r"""SE-ResNet-18 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=True, **kwargs)
   
   def se_resnet34_v2(**kwargs):
       r"""SE-ResNet-34 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=True, **kwargs)
   
   def se_resnet50_v2(**kwargs):
       r"""SE-ResNet-50 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=True, **kwargs)
   
   def se_resnet101_v2(**kwargs):
       r"""SE-ResNet-101 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=True, **kwargs)
   
   def se_resnet152_v2(**kwargs):
       r"""SE-ResNet-152 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=True, **kwargs)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758663463


   @Neutron3529 Yes, v1.x is OK. MXNet 2 has this bug.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] szha commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

szha commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-741879774


   @chinakook thanks for reporting. how did you produce these results?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] TristonC commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

TristonC commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-759097811


   The TF32 is on by default from MXNet 1.8. Pytorch may have TF32 off by default. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-742493093


   > @chinakook thanks for reporting. how did you produce these results?
   
   I will do more tests, and then I will paste the test code here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751439240


   A torch test case. Torch has lower difference between 2080Ti and 3090. However, MXNet will have difference up to 0.3 in some cases.
   ```python
       import torch 
       import torchvision as tv
       torch.backends.cudnn.benchmark=True
   
       model = tv.models.resnet18(pretrained=True)
       model.cuda(0)
       model.eval()
   
       # y is always 948.1921 on CPU
       # y is always 948.1919 on RTX2080Ti whenever cudnn.benchmark is True or False
       # y is 948.19165 on RTX3090 when cudnn.benchmark=False
       # y varies on RTX3090 when cudnn.benchmark=True: 948.19147, 948.1919
       x = torch.ones(1,3,224,224).cuda(0)
       y = model(x)
       y = y.abs().sum()
       print(y.detach().cpu().numpy())
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751424572


   The result also varies in mxnet_cu110-2.0.0b20201226 
   
   Result 1 on RTX 3090 on mxnet_cu110-2.0.0b20201226
   ```
      1.41821623e+00 -6.14694595e-01 -1.21822190e+00  1.47472918e+00
      1.08678900e-01 -1.53905892e+00 -2.19664723e-01  9.48607504e-01
      9.76179004e-01  1.70066428e+00  8.15666854e-01 -1.23275781e+00
      1.59943473e+00  6.92619503e-01 -1.52998209e+00 -1.63329318e-01
     -7.86948949e-02  2.69214898e-01 -6.79625511e-01  1.63082540e-01
      1.30359614e+00  3.54878873e-01  3.44506621e-01 -1.63622832e+00
     -1.83121693e+00 -2.71499276e+00 -1.90867770e+00 -1.56530845e+00
     -2.34865284e+00 -8.75126600e-01 -1.44264027e-02  2.31574321e+00
   
   ```
   
   Result 2 on RTX 3090 on mxnet_cu110-2.0.0b20201226
   ```
      1.41812336e+00 -6.14903927e-01 -1.21819293e+00  1.47481430e+00
      1.08835243e-01 -1.53912401e+00 -2.19649285e-01  9.48447049e-01
      9.76122022e-01  1.70034528e+00  8.15561593e-01 -1.23293483e+00
      1.59933603e+00  6.92907691e-01 -1.53025889e+00 -1.63300052e-01
     -7.87986293e-02  2.69500673e-01 -6.79565012e-01  1.62798882e-01
      1.30361140e+00  3.54955018e-01  3.44288290e-01 -1.63627052e+00
     -1.83101904e+00 -2.71485925e+00 -1.90862215e+00 -1.56534243e+00
     -2.34861803e+00 -8.75208020e-01 -1.46629252e-02  2.31575775e+00
   ```
   
   Test script for mxnet 2.0.0 master
   ```python
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   
   # coding: utf-8
   # pylint: disable= arguments-differ,unused-argument,missing-docstring,too-many-lines
   """ResNets, implemented in Gluon."""
   from __future__ import division
   import gluoncv as gcv
   
   __all__ = ['ResNetV1', 'ResNetV2',
              'BasicBlockV1', 'BasicBlockV2',
              'BottleneckV1', 'BottleneckV2',
              'resnet18_v1', 'resnet34_v1', 'resnet50_v1', 'resnet101_v1', 'resnet152_v1',
              'resnet18_v2', 'resnet34_v2', 'resnet50_v2', 'resnet101_v2', 'resnet152_v2',
              'se_resnet18_v1', 'se_resnet34_v1', 'se_resnet50_v1',
              'se_resnet101_v1', 'se_resnet152_v1',
              'se_resnet18_v2', 'se_resnet34_v2', 'se_resnet50_v2',
              'se_resnet101_v2', 'se_resnet152_v2',
              'get_resnet']
   
   from mxnet.context import cpu
   from mxnet.gluon.block import HybridBlock
   from mxnet.gluon import nn
   from mxnet.gluon.nn import BatchNorm
   from mxnet import base
   from mxnet.util import is_np_array
   
   # Helpers
   def _conv3x3(channels, stride, in_channels):
       return nn.Conv2D(channels, kernel_size=3, strides=stride, padding=1,
                        use_bias=False, in_channels=in_channels)
   
   
   # Blocks
   class BasicBlockV1(HybridBlock):
       r"""BasicBlock V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(_conv3x3(channels, stride, in_channels))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels, 1, channels))
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(residual+x, act_type='relu')
   
           return x
   
   
   class BottleneckV1(HybridBlock):
       r"""Bottleneck V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels//4, stride, channels//4))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x + residual, act_type='relu')
           return x
   
   
   class BasicBlockV2(HybridBlock):
       r"""BasicBlock V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False,
                    norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = _conv3x3(channels, stride, in_channels)
           if not last_gamma:
               self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn2 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels, 1, channels)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   class BottleneckV2(HybridBlock):
       r"""Bottleneck V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False)
           self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels//4, stride, channels//4)
           if not last_gamma:
               self.bn3 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn3 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv3 = nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           x = self.bn3(x)
           x = act(x, act_type='relu')
           x = self.conv3(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   # Nets
   class ResNetV1(HybridBlock):
       r"""ResNet V1 model from
       `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV1, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=channels[i],
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           self.features.add(nn.GlobalAvgPool2D())
   
           self.output = nn.Dense(classes, in_units=channels[-1])
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
   
           return x
   
   
   class ResNetV2(HybridBlock):
       r"""ResNet V2 model from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV2, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           self.features.add(norm_layer(scale=False, center=False,
                                        **({} if norm_kwargs is None else norm_kwargs)))
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           in_channels = channels[0]
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=in_channels,
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
               in_channels = channels[i+1]
           self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.features.add(nn.Activation('relu'))
           self.features.add(nn.GlobalAvgPool2D())
           self.features.add(nn.Flatten())
   
           self.output = nn.Dense(classes, in_units=in_channels)
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
           return x
   
   
   # Specification
   resnet_spec = {18: ('basic_block', [2, 2, 2, 2], [64, 64, 128, 256, 512]),
                  34: ('basic_block', [3, 4, 6, 3], [64, 64, 128, 256, 512]),
                  50: ('bottle_neck', [3, 4, 6, 3], [64, 256, 512, 1024, 2048]),
                  101: ('bottle_neck', [3, 4, 23, 3], [64, 256, 512, 1024, 2048]),
                  152: ('bottle_neck', [3, 8, 36, 3], [64, 256, 512, 1024, 2048])}
   
   resnet_net_versions = [ResNetV1, ResNetV2]
   resnet_block_versions = [{'basic_block': BasicBlockV1, 'bottle_neck': BottleneckV1},
                            {'basic_block': BasicBlockV2, 'bottle_neck': BottleneckV2}]
   
   
   # Constructor
   def get_resnet(version, num_layers, pretrained=False, ctx=cpu(),
                  root='~/.mxnet/models', use_se=False, **kwargs):
       r"""ResNet V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       ResNet V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       version : int
           Version of ResNet. Options are 1, 2.
       num_layers : int
           Numbers of layers. Options are 18, 34, 50, 101, 152.
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default $MXNET_HOME/models
           Location for keeping the model parameters.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       assert num_layers in resnet_spec, \
           "Invalid number of layers: %d. Options are %s"%(
               num_layers, str(resnet_spec.keys()))
       block_type, layers, channels = resnet_spec[num_layers]
       assert 1 <= version <= 2, \
           "Invalid resnet version: %d. Options are 1 and 2."%version
       resnet_class = resnet_net_versions[version-1]
       block_class = resnet_block_versions[version-1][block_type]
       net = resnet_class(block_class, layers, channels, use_se=use_se, **kwargs)
       if pretrained:
           
           from gluoncv.model_zoo.model_store import get_model_file
           if not use_se:
               net.load_parameters(get_model_file('resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           else:
               net.load_parameters(get_model_file('se_resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           from gluoncv.data import ImageNet1kAttr
           attrib = ImageNet1kAttr()
           net.synset = attrib.synset
           net.classes = attrib.classes
           net.classes_long = attrib.classes_long
       return net
   
   def resnet18_v1(**kwargs):
       r"""ResNet-18 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=False, **kwargs)
   
   def resnet34_v1(**kwargs):
       r"""ResNet-34 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=False, **kwargs)
   
   def resnet50_v1(**kwargs):
       r"""ResNet-50 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=False, **kwargs)
   
   def resnet101_v1(**kwargs):
       r"""ResNet-101 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=False, **kwargs)
   
   def resnet152_v1(**kwargs):
       r"""ResNet-152 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=False, **kwargs)
   
   def resnet18_v2(**kwargs):
       r"""ResNet-18 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=False, **kwargs)
   
   def resnet34_v2(**kwargs):
       r"""ResNet-34 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=False, **kwargs)
   
   def resnet50_v2(**kwargs):
       r"""ResNet-50 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=False, **kwargs)
   
   def resnet101_v2(**kwargs):
       r"""ResNet-101 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=False, **kwargs)
   
   def resnet152_v2(**kwargs):
       r"""ResNet-152 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=False, **kwargs)
   
   # SE-ResNet
   def se_resnet18_v1(**kwargs):
       r"""SE-ResNet-18 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=True, **kwargs)
   
   def se_resnet34_v1(**kwargs):
       r"""SE-ResNet-34 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=True, **kwargs)
   
   def se_resnet50_v1(**kwargs):
       r"""SE-ResNet-50 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=True, **kwargs)
   
   def se_resnet101_v1(**kwargs):
       r"""SE-ResNet-101 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=True, **kwargs)
   
   def se_resnet152_v1(**kwargs):
       r"""SE-ResNet-152 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=True, **kwargs)
   
   def se_resnet18_v2(**kwargs):
       r"""SE-ResNet-18 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=True, **kwargs)
   
   def se_resnet34_v2(**kwargs):
       r"""SE-ResNet-34 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=True, **kwargs)
   
   def se_resnet50_v2(**kwargs):
       r"""SE-ResNet-50 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=True, **kwargs)
   
   def se_resnet101_v2(**kwargs):
       r"""SE-ResNet-101 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=True, **kwargs)
   
   def se_resnet152_v2(**kwargs):
       r"""SE-ResNet-152 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=True, **kwargs)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751424572


   The result also varies in mxnet_cu110-2.0.0b20201226 
   minimum test case to reproduce that:
   ```python
   import os
   # os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   import mxnet as mx
   import numpy as np
   from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   
   def testrestnet():
       ctx = mx.gpu(0)
       mx_model = resnet18_v1(pretrained=False)
       mx_model.hybridize()
       mx.random.seed(22)
       mx_model.initialize()
   
       mx_model.reset_ctx(ctx=ctx)
   
       np.random.seed(115)
       x = np.random.uniform(size=(1,3,224,224)).astype(np.float32)
   
       x_mx = mx.nd.array(x, ctx=ctx)
   
       y_mx = mx_model(x_mx)
   
       # the res is -1219.706 on RTX3090 with MXNET_CUDNN_AUTOTUNE_DEFAULT=0
       # the res varies on RTX3090 GPU without MXNET_CUDNN_AUTOTUNE_DEFAULT=0: -1219.7754, -1220.0055, -1220.0052, -1220.0051
       # the res is -1220.0052 on RTX2080Ti
       # the res is -1220.0062 on CPU
       res = y_mx.asnumpy().sum()
   
       print(res)
   
   if __name__ == '__main__':
       testrestnet()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751424572


   The result also varies in mxnet_cu110-2.0.0b20201226 
   
   Result 1 on RTX 3090 on mxnet_cu110-2.0.0b20201226
   ```
      1.41821623e+00 -6.14694595e-01 -1.21822190e+00  1.47472918e+00
      1.08678900e-01 -1.53905892e+00 -2.19664723e-01  9.48607504e-01
      9.76179004e-01  1.70066428e+00  8.15666854e-01 -1.23275781e+00
      1.59943473e+00  6.92619503e-01 -1.52998209e+00 -1.63329318e-01
     -7.86948949e-02  2.69214898e-01 -6.79625511e-01  1.63082540e-01
      1.30359614e+00  3.54878873e-01  3.44506621e-01 -1.63622832e+00
     -1.83121693e+00 -2.71499276e+00 -1.90867770e+00 -1.56530845e+00
     -2.34865284e+00 -8.75126600e-01 -1.44264027e-02  2.31574321e+00
   
   ```
   
   Result 2 on RTX 3090 on mxnet_cu110-2.0.0b20201226
   ```
      1.41812336e+00 -6.14903927e-01 -1.21819293e+00  1.47481430e+00
      1.08835243e-01 -1.53912401e+00 -2.19649285e-01  9.48447049e-01
      9.76122022e-01  1.70034528e+00  8.15561593e-01 -1.23293483e+00
      1.59933603e+00  6.92907691e-01 -1.53025889e+00 -1.63300052e-01
     -7.87986293e-02  2.69500673e-01 -6.79565012e-01  1.62798882e-01
      1.30361140e+00  3.54955018e-01  3.44288290e-01 -1.63627052e+00
     -1.83101904e+00 -2.71485925e+00 -1.90862215e+00 -1.56534243e+00
     -2.34861803e+00 -8.75208020e-01 -1.46629252e-02  2.31575775e+00
   ```
   
   Test script for mxnet 2.0.0 master, use resent18_v1 to test between the results on cpu and gpu to address the problem.
   ```python
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   
   # coding: utf-8
   # pylint: disable= arguments-differ,unused-argument,missing-docstring,too-many-lines
   """ResNets, implemented in Gluon."""
   from __future__ import division
   import gluoncv as gcv
   
   __all__ = ['ResNetV1', 'ResNetV2',
              'BasicBlockV1', 'BasicBlockV2',
              'BottleneckV1', 'BottleneckV2',
              'resnet18_v1', 'resnet34_v1', 'resnet50_v1', 'resnet101_v1', 'resnet152_v1',
              'resnet18_v2', 'resnet34_v2', 'resnet50_v2', 'resnet101_v2', 'resnet152_v2',
              'se_resnet18_v1', 'se_resnet34_v1', 'se_resnet50_v1',
              'se_resnet101_v1', 'se_resnet152_v1',
              'se_resnet18_v2', 'se_resnet34_v2', 'se_resnet50_v2',
              'se_resnet101_v2', 'se_resnet152_v2',
              'get_resnet']
   
   from mxnet.context import cpu
   from mxnet.gluon.block import HybridBlock
   from mxnet.gluon import nn
   from mxnet.gluon.nn import BatchNorm
   from mxnet import base
   from mxnet.util import is_np_array
   
   # Helpers
   def _conv3x3(channels, stride, in_channels):
       return nn.Conv2D(channels, kernel_size=3, strides=stride, padding=1,
                        use_bias=False, in_channels=in_channels)
   
   
   # Blocks
   class BasicBlockV1(HybridBlock):
       r"""BasicBlock V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(_conv3x3(channels, stride, in_channels))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels, 1, channels))
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(residual+x, act_type='relu')
   
           return x
   
   
   class BottleneckV1(HybridBlock):
       r"""Bottleneck V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels//4, stride, channels//4))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x + residual, act_type='relu')
           return x
   
   
   class BasicBlockV2(HybridBlock):
       r"""BasicBlock V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False,
                    norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = _conv3x3(channels, stride, in_channels)
           if not last_gamma:
               self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn2 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels, 1, channels)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   class BottleneckV2(HybridBlock):
       r"""Bottleneck V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False)
           self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels//4, stride, channels//4)
           if not last_gamma:
               self.bn3 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn3 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv3 = nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           x = self.bn3(x)
           x = act(x, act_type='relu')
           x = self.conv3(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   # Nets
   class ResNetV1(HybridBlock):
       r"""ResNet V1 model from
       `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV1, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=channels[i],
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           self.features.add(nn.GlobalAvgPool2D())
   
           self.output = nn.Dense(classes, in_units=channels[-1])
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
   
           return x
   
   
   class ResNetV2(HybridBlock):
       r"""ResNet V2 model from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV2, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           self.features.add(norm_layer(scale=False, center=False,
                                        **({} if norm_kwargs is None else norm_kwargs)))
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           in_channels = channels[0]
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=in_channels,
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
               in_channels = channels[i+1]
           self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.features.add(nn.Activation('relu'))
           self.features.add(nn.GlobalAvgPool2D())
           self.features.add(nn.Flatten())
   
           self.output = nn.Dense(classes, in_units=in_channels)
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
           return x
   
   
   # Specification
   resnet_spec = {18: ('basic_block', [2, 2, 2, 2], [64, 64, 128, 256, 512]),
                  34: ('basic_block', [3, 4, 6, 3], [64, 64, 128, 256, 512]),
                  50: ('bottle_neck', [3, 4, 6, 3], [64, 256, 512, 1024, 2048]),
                  101: ('bottle_neck', [3, 4, 23, 3], [64, 256, 512, 1024, 2048]),
                  152: ('bottle_neck', [3, 8, 36, 3], [64, 256, 512, 1024, 2048])}
   
   resnet_net_versions = [ResNetV1, ResNetV2]
   resnet_block_versions = [{'basic_block': BasicBlockV1, 'bottle_neck': BottleneckV1},
                            {'basic_block': BasicBlockV2, 'bottle_neck': BottleneckV2}]
   
   
   # Constructor
   def get_resnet(version, num_layers, pretrained=False, ctx=cpu(),
                  root='~/.mxnet/models', use_se=False, **kwargs):
       r"""ResNet V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       ResNet V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       version : int
           Version of ResNet. Options are 1, 2.
       num_layers : int
           Numbers of layers. Options are 18, 34, 50, 101, 152.
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default $MXNET_HOME/models
           Location for keeping the model parameters.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       assert num_layers in resnet_spec, \
           "Invalid number of layers: %d. Options are %s"%(
               num_layers, str(resnet_spec.keys()))
       block_type, layers, channels = resnet_spec[num_layers]
       assert 1 <= version <= 2, \
           "Invalid resnet version: %d. Options are 1 and 2."%version
       resnet_class = resnet_net_versions[version-1]
       block_class = resnet_block_versions[version-1][block_type]
       net = resnet_class(block_class, layers, channels, use_se=use_se, **kwargs)
       if pretrained:
           
           from gluoncv.model_zoo.model_store import get_model_file
           if not use_se:
               net.load_parameters(get_model_file('resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           else:
               net.load_parameters(get_model_file('se_resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           from gluoncv.data import ImageNet1kAttr
           attrib = ImageNet1kAttr()
           net.synset = attrib.synset
           net.classes = attrib.classes
           net.classes_long = attrib.classes_long
       return net
   
   def resnet18_v1(**kwargs):
       r"""ResNet-18 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=False, **kwargs)
   
   def resnet34_v1(**kwargs):
       r"""ResNet-34 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=False, **kwargs)
   
   def resnet50_v1(**kwargs):
       r"""ResNet-50 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=False, **kwargs)
   
   def resnet101_v1(**kwargs):
       r"""ResNet-101 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=False, **kwargs)
   
   def resnet152_v1(**kwargs):
       r"""ResNet-152 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=False, **kwargs)
   
   def resnet18_v2(**kwargs):
       r"""ResNet-18 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=False, **kwargs)
   
   def resnet34_v2(**kwargs):
       r"""ResNet-34 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=False, **kwargs)
   
   def resnet50_v2(**kwargs):
       r"""ResNet-50 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=False, **kwargs)
   
   def resnet101_v2(**kwargs):
       r"""ResNet-101 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=False, **kwargs)
   
   def resnet152_v2(**kwargs):
       r"""ResNet-152 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=False, **kwargs)
   
   # SE-ResNet
   def se_resnet18_v1(**kwargs):
       r"""SE-ResNet-18 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=True, **kwargs)
   
   def se_resnet34_v1(**kwargs):
       r"""SE-ResNet-34 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=True, **kwargs)
   
   def se_resnet50_v1(**kwargs):
       r"""SE-ResNet-50 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=True, **kwargs)
   
   def se_resnet101_v1(**kwargs):
       r"""SE-ResNet-101 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=True, **kwargs)
   
   def se_resnet152_v1(**kwargs):
       r"""SE-ResNet-152 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=True, **kwargs)
   
   def se_resnet18_v2(**kwargs):
       r"""SE-ResNet-18 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=True, **kwargs)
   
   def se_resnet34_v2(**kwargs):
       r"""SE-ResNet-34 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=True, **kwargs)
   
   def se_resnet50_v2(**kwargs):
       r"""SE-ResNet-50 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=True, **kwargs)
   
   def se_resnet101_v2(**kwargs):
       r"""SE-ResNet-101 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=True, **kwargs)
   
   def se_resnet152_v2(**kwargs):
       r"""SE-ResNet-152 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=True, **kwargs)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758382657


   @Neutron3529 I think It has nothing to do with tf32. I've tested with ```NVIDIA_TF32_OVERRIDE=0``` as you suggested, the problem is not solved.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Neutron3529 commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

Neutron3529 commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758323589


   > A torch test case. Torch has lower difference between 2080Ti and 3090. However, MXNet on RTX3090 will have difference up to 0.3 in some cases.
   > 
   > ```python
   >     import torch 
   >     import torchvision as tv
   >     torch.backends.cudnn.benchmark=True
   > 
   >     model = tv.models.resnet18(pretrained=True)
   >     model.cuda(0)
   >     model.eval()
   > 
   >     # y is always 948.1921 on CPU
   >     # y is always 948.1919 on RTX2080Ti whenever cudnn.benchmark is True or False
   >     # y is 948.19165 on RTX3090 when cudnn.benchmark=False
   >     # y varies on RTX3090 when cudnn.benchmark=True: 948.19147, 948.1919
   >     x = torch.ones(1,3,224,224).cuda(0)
   >     y = model(x)
   >     y = y.abs().sum()
   >     print(y.detach().cpu().numpy())
   > ```
   
   have you ever tried `NVIDIA_TF32_OVERRIDE=0 python`?
   3090 using tf32 to accelerate training&testing by default, and using `NVIDIA_TF32_OVERRIDE=0` will disable it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751432013


   After more tests, I found that the result also varies on RTX2080Ti on both MXNet 1.9.0 and MXNet 2.0.0.
   The result have 0.005 difference in the shallow layer. ~~I think it will have more difference as the layer grows.~~
   ```python
   import os
   # os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   import mxnet as mx
   import numpy as np
   from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   
   def testrestnet():
       ctx = mx.gpu(0)
       mx_model = resnet18_v1(pretrained=True,ctx=ctx)
       mx_model.hybridize()
   
       x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
   
       y_mx = mx_model.features[0:6](x_mx)
   
       # the res is always 13064.977 on CPU
       # the res varies on RTX2080Ti/RTX3090 on both MXNet 1.9.0 and 2.0.0 without 
       # MXNET_CUDNN_AUTOTUNE_DEFAULT=0: 13064.971, 13064.976
       res = y_mx.asnumpy().sum()
   
       print(res)
   
   if __name__ == '__main__':
       testrestnet()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751439240


   A torch test case. Torch has lower difference between 2080Ti and 3090. However, MXNet on RTX3090 will have difference up to 0.3 in some cases.
   ```python
       import torch 
       import torchvision as tv
       torch.backends.cudnn.benchmark=True
   
       model = tv.models.resnet18(pretrained=True)
       model.cuda(0)
       model.eval()
   
       # y is always 948.1921 on CPU
       # y is always 948.1919 on RTX2080Ti whenever cudnn.benchmark is True or False
       # y is 948.19165 on RTX3090 when cudnn.benchmark=False
       # y varies on RTX3090 when cudnn.benchmark=True: 948.19147, 948.1919
       x = torch.ones(1,3,224,224).cuda(0)
       y = model(x)
       y = y.abs().sum()
       print(y.detach().cpu().numpy())
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751424572


   The result also varies in mxnet_cu110-2.0.0b20201226 
   
   Result 1 on RTX 3090 GPU on mxnet_cu110-2.0.0b20201226
   ```
      1.41821623e+00 -6.14694595e-01 -1.21822190e+00  1.47472918e+00
      1.08678900e-01 -1.53905892e+00 -2.19664723e-01  9.48607504e-01
      9.76179004e-01  1.70066428e+00  8.15666854e-01 -1.23275781e+00
      1.59943473e+00  6.92619503e-01 -1.52998209e+00 -1.63329318e-01
     -7.86948949e-02  2.69214898e-01 -6.79625511e-01  1.63082540e-01
      1.30359614e+00  3.54878873e-01  3.44506621e-01 -1.63622832e+00
     -1.83121693e+00 -2.71499276e+00 -1.90867770e+00 -1.56530845e+00
     -2.34865284e+00 -8.75126600e-01 -1.44264027e-02  2.31574321e+00
   
   ```
   
   Result 2 on RTX 3090 GPU on mxnet_cu110-2.0.0b20201226
   ```
      1.41812336e+00 -6.14903927e-01 -1.21819293e+00  1.47481430e+00
      1.08835243e-01 -1.53912401e+00 -2.19649285e-01  9.48447049e-01
      9.76122022e-01  1.70034528e+00  8.15561593e-01 -1.23293483e+00
      1.59933603e+00  6.92907691e-01 -1.53025889e+00 -1.63300052e-01
     -7.87986293e-02  2.69500673e-01 -6.79565012e-01  1.62798882e-01
      1.30361140e+00  3.54955018e-01  3.44288290e-01 -1.63627052e+00
     -1.83101904e+00 -2.71485925e+00 -1.90862215e+00 -1.56534243e+00
     -2.34861803e+00 -8.75208020e-01 -1.46629252e-02  2.31575775e+00
   ```
   
   Test script for mxnet 2.0.0 master, use resent18_v1 to test between the results on cpu and gpu to address the problem.
   ```python
   # Licensed to the Apache Software Foundation (ASF) under one
   # or more contributor license agreements.  See the NOTICE file
   # distributed with this work for additional information
   # regarding copyright ownership.  The ASF licenses this file
   # to you under the Apache License, Version 2.0 (the
   # "License"); you may not use this file except in compliance
   # with the License.  You may obtain a copy of the License at
   #
   #   http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing,
   # software distributed under the License is distributed on an
   # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   # KIND, either express or implied.  See the License for the
   # specific language governing permissions and limitations
   # under the License.
   
   # coding: utf-8
   # pylint: disable= arguments-differ,unused-argument,missing-docstring,too-many-lines
   """ResNets, implemented in Gluon."""
   from __future__ import division
   import gluoncv as gcv
   
   __all__ = ['ResNetV1', 'ResNetV2',
              'BasicBlockV1', 'BasicBlockV2',
              'BottleneckV1', 'BottleneckV2',
              'resnet18_v1', 'resnet34_v1', 'resnet50_v1', 'resnet101_v1', 'resnet152_v1',
              'resnet18_v2', 'resnet34_v2', 'resnet50_v2', 'resnet101_v2', 'resnet152_v2',
              'se_resnet18_v1', 'se_resnet34_v1', 'se_resnet50_v1',
              'se_resnet101_v1', 'se_resnet152_v1',
              'se_resnet18_v2', 'se_resnet34_v2', 'se_resnet50_v2',
              'se_resnet101_v2', 'se_resnet152_v2',
              'get_resnet']
   
   from mxnet.context import cpu
   from mxnet.gluon.block import HybridBlock
   from mxnet.gluon import nn
   from mxnet.gluon.nn import BatchNorm
   from mxnet import base
   from mxnet.util import is_np_array
   
   # Helpers
   def _conv3x3(channels, stride, in_channels):
       return nn.Conv2D(channels, kernel_size=3, strides=stride, padding=1,
                        use_bias=False, in_channels=in_channels)
   
   
   # Blocks
   class BasicBlockV1(HybridBlock):
       r"""BasicBlock V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(_conv3x3(channels, stride, in_channels))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels, 1, channels))
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(residual+x, act_type='relu')
   
           return x
   
   
   class BottleneckV1(HybridBlock):
       r"""Bottleneck V1 from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       This is used for ResNet V1 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV1, self).__init__(**kwargs)
           self.body = nn.HybridSequential()
           self.body.add(nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(_conv3x3(channels//4, stride, channels//4))
           self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.body.add(nn.Activation('relu'))
           self.body.add(nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False))
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if not last_gamma:
               self.body.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.body.add(norm_layer(gamma_initializer='zeros',
                                        **({} if norm_kwargs is None else norm_kwargs)))
   
           if downsample:
               self.downsample = nn.HybridSequential()
               self.downsample.add(nn.Conv2D(channels, kernel_size=1, strides=stride,
                                             use_bias=False, in_channels=in_channels))
               self.downsample.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
   
           x = self.body(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           if self.downsample:
               residual = self.downsample(residual)
   
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x + residual, act_type='relu')
           return x
   
   
   class BasicBlockV2(HybridBlock):
       r"""BasicBlock V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 18, 34 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False,
                    norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BasicBlockV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = _conv3x3(channels, stride, in_channels)
           if not last_gamma:
               self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn2 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels, 1, channels)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   class BottleneckV2(HybridBlock):
       r"""Bottleneck V2 from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
       This is used for ResNet V2 for 50, 101, 152 layers.
   
       Parameters
       ----------
       channels : int
           Number of output channels.
       stride : int
           Stride size.
       downsample : bool, default False
           Whether to downsample the input.
       in_channels : int, default 0
           Number of input channels. Default is 0, to infer from the graph.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, channels, stride, downsample=False, in_channels=0,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(BottleneckV2, self).__init__(**kwargs)
           self.bn1 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv1 = nn.Conv2D(channels//4, kernel_size=1, strides=1, use_bias=False)
           self.bn2 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           self.conv2 = _conv3x3(channels//4, stride, channels//4)
           if not last_gamma:
               self.bn3 = norm_layer(**({} if norm_kwargs is None else norm_kwargs))
           else:
               self.bn3 = norm_layer(gamma_initializer='zeros',
                                     **({} if norm_kwargs is None else norm_kwargs))
           self.conv3 = nn.Conv2D(channels, kernel_size=1, strides=1, use_bias=False)
   
           if use_se:
               self.se = nn.HybridSequential()
               self.se.add(nn.Dense(channels // 16, use_bias=False))
               self.se.add(nn.Activation('relu'))
               self.se.add(nn.Dense(channels, use_bias=False))
               self.se.add(nn.Activation('sigmoid'))
           else:
               self.se = None
   
           if downsample:
               self.downsample = nn.Conv2D(channels, 1, stride, use_bias=False,
                                           in_channels=in_channels)
           else:
               self.downsample = None
   
       def hybrid_forward(self, F, x):
           residual = x
           x = self.bn1(x)
           act = F.npx.activation if is_np_array() else F.Activation
           x = act(x, act_type='relu')
           if self.downsample:
               residual = self.downsample(x)
           x = self.conv1(x)
   
           x = self.bn2(x)
           x = act(x, act_type='relu')
           x = self.conv2(x)
   
           x = self.bn3(x)
           x = act(x, act_type='relu')
           x = self.conv3(x)
   
           if self.se:
               w = F.contrib.AdaptiveAvgPooling2D(x, output_size=1)
               w = self.se(w)
               x = F.broadcast_mul(x, w.expand_dims(axis=2).expand_dims(axis=2))
   
           return x + residual
   
   
   # Nets
   class ResNetV1(HybridBlock):
       r"""ResNet V1 model from
       `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV1, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=channels[i],
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           self.features.add(nn.GlobalAvgPool2D())
   
           self.output = nn.Dense(classes, in_units=channels[-1])
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
   
           return x
   
   
   class ResNetV2(HybridBlock):
       r"""ResNet V2 model from
       `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       block : HybridBlock
           Class for the residual block. Options are BasicBlockV1, BottleneckV1.
       layers : list of int
           Numbers of layers in each block
       channels : list of int
           Numbers of channels in each block. Length should be one larger than layers list.
       classes : int, default 1000
           Number of classification classes.
       thumbnail : bool, default False
           Enable thumbnail.
       last_gamma : bool, default False
           Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       def __init__(self, block, layers, channels, classes=1000, thumbnail=False,
                    last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
           super(ResNetV2, self).__init__(**kwargs)
           assert len(layers) == len(channels) - 1
           self.features = nn.HybridSequential()
           self.features.add(norm_layer(scale=False, center=False,
                                        **({} if norm_kwargs is None else norm_kwargs)))
           if thumbnail:
               self.features.add(_conv3x3(channels[0], 1, 0))
           else:
               self.features.add(nn.Conv2D(channels[0], 7, 2, 3, use_bias=False))
               self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
               self.features.add(nn.Activation('relu'))
               self.features.add(nn.MaxPool2D(3, 2, 1))
   
           in_channels = channels[0]
           for i, num_layer in enumerate(layers):
               stride = 1 if i == 0 else 2
               self.features.add(self._make_layer(block, num_layer, channels[i+1],
                                                  stride, i+1, in_channels=in_channels,
                                                  last_gamma=last_gamma, use_se=use_se,
                                                  norm_layer=norm_layer, norm_kwargs=norm_kwargs))
               in_channels = channels[i+1]
           self.features.add(norm_layer(**({} if norm_kwargs is None else norm_kwargs)))
           self.features.add(nn.Activation('relu'))
           self.features.add(nn.GlobalAvgPool2D())
           self.features.add(nn.Flatten())
   
           self.output = nn.Dense(classes, in_units=in_channels)
   
       def _make_layer(self, block, layers, channels, stride, stage_index, in_channels=0,
                       last_gamma=False, use_se=False, norm_layer=BatchNorm, norm_kwargs=None):
           layer = nn.HybridSequential()
           layer.add(block(channels, stride, channels != in_channels, in_channels=in_channels,
                           last_gamma=last_gamma, use_se=use_se,
                           norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           for _ in range(layers-1):
               layer.add(block(channels, 1, False, in_channels=channels,
                               last_gamma=last_gamma, use_se=use_se,
                               norm_layer=norm_layer, norm_kwargs=norm_kwargs))
           return layer
   
       def hybrid_forward(self, F, x):
           x = self.features(x)
           x = self.output(x)
           return x
   
   
   # Specification
   resnet_spec = {18: ('basic_block', [2, 2, 2, 2], [64, 64, 128, 256, 512]),
                  34: ('basic_block', [3, 4, 6, 3], [64, 64, 128, 256, 512]),
                  50: ('bottle_neck', [3, 4, 6, 3], [64, 256, 512, 1024, 2048]),
                  101: ('bottle_neck', [3, 4, 23, 3], [64, 256, 512, 1024, 2048]),
                  152: ('bottle_neck', [3, 8, 36, 3], [64, 256, 512, 1024, 2048])}
   
   resnet_net_versions = [ResNetV1, ResNetV2]
   resnet_block_versions = [{'basic_block': BasicBlockV1, 'bottle_neck': BottleneckV1},
                            {'basic_block': BasicBlockV2, 'bottle_neck': BottleneckV2}]
   
   
   # Constructor
   def get_resnet(version, num_layers, pretrained=False, ctx=cpu(),
                  root='~/.mxnet/models', use_se=False, **kwargs):
       r"""ResNet V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
       ResNet V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       version : int
           Version of ResNet. Options are 1, 2.
       num_layers : int
           Numbers of layers. Options are 18, 34, 50, 101, 152.
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default $MXNET_HOME/models
           Location for keeping the model parameters.
       use_se : bool, default False
           Whether to use Squeeze-and-Excitation module
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       assert num_layers in resnet_spec, \
           "Invalid number of layers: %d. Options are %s"%(
               num_layers, str(resnet_spec.keys()))
       block_type, layers, channels = resnet_spec[num_layers]
       assert 1 <= version <= 2, \
           "Invalid resnet version: %d. Options are 1 and 2."%version
       resnet_class = resnet_net_versions[version-1]
       block_class = resnet_block_versions[version-1][block_type]
       net = resnet_class(block_class, layers, channels, use_se=use_se, **kwargs)
       if pretrained:
           
           from gluoncv.model_zoo.model_store import get_model_file
           if not use_se:
               net.load_parameters(get_model_file('resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           else:
               net.load_parameters(get_model_file('se_resnet%d_v%d'%(num_layers, version),
                                                  tag=pretrained, root=root), ctx=ctx)
           from gluoncv.data import ImageNet1kAttr
           attrib = ImageNet1kAttr()
           net.synset = attrib.synset
           net.classes = attrib.classes
           net.classes_long = attrib.classes_long
       return net
   
   def resnet18_v1(**kwargs):
       r"""ResNet-18 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=False, **kwargs)
   
   def resnet34_v1(**kwargs):
       r"""ResNet-34 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=False, **kwargs)
   
   def resnet50_v1(**kwargs):
       r"""ResNet-50 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=False, **kwargs)
   
   def resnet101_v1(**kwargs):
       r"""ResNet-101 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=False, **kwargs)
   
   def resnet152_v1(**kwargs):
       r"""ResNet-152 V1 model from `"Deep Residual Learning for Image Recognition"
       <http://arxiv.org/abs/1512.03385>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=False, **kwargs)
   
   def resnet18_v2(**kwargs):
       r"""ResNet-18 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=False, **kwargs)
   
   def resnet34_v2(**kwargs):
       r"""ResNet-34 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=False, **kwargs)
   
   def resnet50_v2(**kwargs):
       r"""ResNet-50 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=False, **kwargs)
   
   def resnet101_v2(**kwargs):
       r"""ResNet-101 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=False, **kwargs)
   
   def resnet152_v2(**kwargs):
       r"""ResNet-152 V2 model from `"Identity Mappings in Deep Residual Networks"
       <https://arxiv.org/abs/1603.05027>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=False, **kwargs)
   
   # SE-ResNet
   def se_resnet18_v1(**kwargs):
       r"""SE-ResNet-18 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 18, use_se=True, **kwargs)
   
   def se_resnet34_v1(**kwargs):
       r"""SE-ResNet-34 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 34, use_se=True, **kwargs)
   
   def se_resnet50_v1(**kwargs):
       r"""SE-ResNet-50 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 50, use_se=True, **kwargs)
   
   def se_resnet101_v1(**kwargs):
       r"""SE-ResNet-101 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 101, use_se=True, **kwargs)
   
   def se_resnet152_v1(**kwargs):
       r"""SE-ResNet-152 V1 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(1, 152, use_se=True, **kwargs)
   
   def se_resnet18_v2(**kwargs):
       r"""SE-ResNet-18 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 18, use_se=True, **kwargs)
   
   def se_resnet34_v2(**kwargs):
       r"""SE-ResNet-34 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 34, use_se=True, **kwargs)
   
   def se_resnet50_v2(**kwargs):
       r"""SE-ResNet-50 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 50, use_se=True, **kwargs)
   
   def se_resnet101_v2(**kwargs):
       r"""SE-ResNet-101 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 101, use_se=True, **kwargs)
   
   def se_resnet152_v2(**kwargs):
       r"""SE-ResNet-152 V2 model from `"Squeeze-and-Excitation Networks"
       <https://arxiv.org/abs/1709.01507>`_ paper.
   
       Parameters
       ----------
       pretrained : bool or str
           Boolean value controls whether to load the default pretrained weights for model.
           String value represents the hashtag for a certain version of pretrained weights.
       ctx : Context, default CPU
           The context in which to load the pretrained weights.
       root : str, default '$MXNET_HOME/models'
           Location for keeping the model parameters.
       norm_layer : object
           Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
           Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       norm_kwargs : dict
           Additional `norm_layer` arguments, for example `num_devices=4`
           for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
       """
       return get_resnet(2, 152, use_se=True, **kwargs)
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Neutron3529 edited a comment on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

Neutron3529 edited a comment on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758323589


   > After more tests, I found that the result also varies on RTX2080Ti on both MXNet 1.9.0 and MXNet 2.0.0.
   > ~The result have 0.005 difference in the shallow layer. I think it will have more difference as the layer grows.~
   > 
   > ```python
   > import os
   > # os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   > import mxnet as mx
   > import numpy as np
   > from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   > 
   > def testrestnet():
   >     ctx = mx.gpu(0)
   >     mx_model = resnet18_v1(pretrained=True,ctx=ctx)
   >     mx_model.hybridize()
   > 
   >     x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
   > 
   >     y_mx = mx_model.features[0:6](x_mx)
   > 
   >     # the res is always 13064.977 on CPU
   >     # the res varies on RTX2080Ti/RTX3090 on both MXNet 1.9.0 and 2.0.0 without 
   >     # MXNET_CUDNN_AUTOTUNE_DEFAULT=0: 13064.971, 13064.976
   >     res = y_mx.asnumpy().sum()
   > 
   >     print(res)
   > 
   > if __name__ == '__main__':
   >     testrestnet()
   > ```
   
   have you ever tried `NVIDIA_TF32_OVERRIDE=0 python`?
   3090 using tf32 to accelerate training&testing by default, and using `NVIDIA_TF32_OVERRIDE=0` will disable it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751432013


   After more tests, I found that the result also varies on RTX2080Ti on both MXNet 1.9.0 and MXNet 2.0.0.
   The result have 0.005 difference in the shallow layer. I think it will have more difference as the layer grows.
   ```python
   import os
   # os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   import mxnet as mx
   import numpy as np
   from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   
   def testrestnet():
       ctx = mx.gpu(0)
       mx_model = resnet18_v1(pretrained=True,ctx=ctx)
       mx_model.hybridize()
   
       x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
   
       y_mx = mx_model.features[0:6](x_mx)
   
       # the res is always 13064.977 on CPU
       # the res varies on RTX2080Ti/RTX3090 on both MXNet 1.9.0 and 2.0.0 without 
       # MXNET_CUDNN_AUTOTUNE_DEFAULT=0: 13064.971, 13064.976
       res = y_mx.asnumpy().sum()
   
       print(res)
   
   if __name__ == '__main__':
       testrestnet()
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] chinakook commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

chinakook commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-751420921


   The mxnet_cu110-1.9.0b20201226 built by official is good. I'll do more tests to find the reasons.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] Neutron3529 commented on issue #19649: Results are significant different between RTX 2080Ti and RTX 3090

Posted by GitBox <gi...@apache.org>.

Neutron3529 commented on issue #19649:
URL: https://github.com/apache/incubator-mxnet/issues/19649#issuecomment-758560138


   > @Neutron3529 I think It has nothing to do with tf32. I've tested with `NVIDIA_TF32_OVERRIDE=0` as you suggested, the problem is not solved.
   my result (v1.x, compiled by myself):
   ```python
   >>> import os
   >>> os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   >>> import mxnet as mx
   >>> import numpy as np
   >>> from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   >>> def testrestnet(ctx=mx.gpu(0)):
       mx_model = resnet18_v1(pretrained=True,ctx=ctx)
       mx_model.hybridize()                           
       x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
       y_mx = mx_model.features[0:6](x_mx)
       res = y_mx.asnumpy().sum()
       print(res)
   ... 
   >>> testrestnet(mx.cpu())
   Downloading /me/mxnet/models/resnet18_v1-a0666292.zipa165046a-afde-4d5a-a034-0163a93f6047 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
   13064.974
   >>> testrestnet(mx.cpu())
   13064.974
   >>> testrestnet(mx.cpu())
   13064.974
   >>> testrestnet(mx.gpu())
   13064.976
   >>> testrestnet(mx.gpu())
   13064.976
   --- using mxnet without TF_OVERRIDE
   >>> import os
   >>> os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   >>> import mxnet as mx
   >>> import numpy as np
   >>> from mxnet.gluon.model_zoo.vision.resnet import resnet18_v1
   >>> def testrestnet(ctx=mx.gpu(0)):
       mx_model = resnet18_v1(pretrained=True,ctx=ctx)
       mx_model.hybridize()                           
       x_mx = mx.nd.ones(shape=(1,3,224,224), ctx=ctx)
       y_mx = mx_model.features[0:6](x_mx)
       res = y_mx.asnumpy().sum()
       print(res)
   ... 
   >>> testrestnet()
   13065.814
   >>> testrestnet(mx.cpu())
   13064.974
   ```
   it seems that `NVIDIA_TF32_OVERRIDE=0` works for me, and without it may bring a huge bias.
   
   what's more, my CPU generated the different result (`13064.974` vs `13064.977`)compared to yours, maybe the error is normal and do not worth an issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org