You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/10/10 07:57:43 UTC
[GitHub] [incubator-mxnet] CanyonWind opened a new issue #16424: [Channel Shuffle / Hard Swish / Hard Sigmoid] running in MKL CPU backend failed

CanyonWind opened a new issue #16424: [Channel Shuffle / Hard Swish / Hard Sigmoid] running in MKL CPU backend failed
URL: https://github.com/apache/incubator-mxnet/issues/16424
 
 
   ## Description
   I've trained a [NAS searched ShuffleNet related model](https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS), which contains some rare operators like Channel Shuffle, hard Swish, hard Sigmoid, etc.. It runs fine on both GPU and raw CPU backend but failed (val_acc = 0.0) on MKL CPU backend. 
   
   ## Environment info (Required)
   
   ```
   ----------Python Info----------
   Version      : 3.7.3
   Compiler     : Clang 10.0.1 (clang-1001.0.46.3)
   Build        : ('default', 'Mar 27 2019 09:23:15')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.0.3
   Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.6.0
   Directory    : /Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet
   Commit Hash   : 1d0d1e687fdf436896f8ca106c4915adfd29c8cb
   Library      : ['/Users/yaoxi/projects/oneshot_nas/nasenv/lib/python3.7/site-packages/mxnet/libmxnet.so']
   Build features:
   ✖ CUDA
   ✖ CUDNN
   ✖ NCCL
   ✖ CUDA_RTC
   ✖ TENSORRT
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✔ CPU_SSE4_1
   ✔ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✔ CPU_AVX
   ✖ CPU_AVX2
   ✖ OPENMP
   ✖ SSE
   ✖ F16C
   ✖ JEMALLOC
   ✖ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✖ BLAS_MKL
   ✔ BLAS_APPLE
   ✔ LAPACK
   ✔ MKLDNN
   ✔ OPENCV
   ✖ CAFFE
   ✖ PROFILER
   ✔ DIST_KVSTORE
   ✖ CXX14
   ✖ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------System Info----------
   Platform     : Darwin-18.2.0-x86_64-i386-64bit
   system       : Darwin
   node         : yaoxis-MacBook-Pro.local
   release      : 18.2.0
   version      : Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64
   ----------Hardware Info----------
   machine      : x86_64
   processor    : i386
   b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-8259U CPU @ 2.30GHz'
   b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
   b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
   b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0358 sec, LOAD: 0.6333 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0269 sec, LOAD: 0.1421 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0581 sec, LOAD: 0.2112 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0193 sec, LOAD: 0.1315 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0483 sec, LOAD: 0.6375 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0008 sec, LOAD: 0.2240 sec.
   ----------Environment----------
   ```
   
   ## Background
   This ShuffleNet related model has been built by:
   
   | Layers | Ops
   | :--------------------- | :-----: |
   |Common ops                                | conv, BN, Activation('relu') |
   |Concat                                          | concat |
   |Shuffle Channel & Slice              | reshape-swapaxes-reshape-slice |
   |Hard swish                                   | plusscalar-clip-divscalar-mul |
   |Hard sigmoid                               | plusscalar-clip-divscalar |
   |Global Average Pooling              | pool |
   
   For how the Shuffle Channel is implemented: 
   
   ![alt text](https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/master/images/Channel_Shuffle_and_Split.png?raw=true)
   
   ## Error Message
   The model was planned to be quantized by using MXNet 1.6.0 (master) [quantization tool](https://github.com/apache/incubator-mxnet/tree/master/example/quantization). The "error" occurs when trying to use **MKL backend** to run the raw model before quantization as well as the quantized model.
   
   - When using [imagenet_inference.py](https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/master/quantization/imagenet_inference.py) with MXNet CPU only (no MKL), it works fine
   ```sh
   INFO:logger:('accuracy', 0.771875)
   INFO:logger:('top_k_accuracy_5', 0.909375)
   ```
   
   - While using the same code [imagenet_inference.py](https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/master/quantization/imagenet_inference.py) but with MXNet-mkl, it **failed**:
   ```sh
   INFO:logger:('accuracy', 0.0)
   INFO:logger:('top_k_accuracy_5', 0.003125)
   ```
   
   - Interestingly, with MXNet-mkl, the quantization process works smoothly and generates a quantized model. But when using [imagenet_inference.py](https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS/blob/master/quantization/imagenet_inference.py) to verify the quantized model's performance, it **failed again** just like the raw model before quantization.
   ```
   INFO:logger:('accuracy', 0.0)
   INFO:logger:('top_k_accuracy_5', 0.003125)
   ```
   
   ## Steps to reproduce:
   1. Clone the code and the model has been put in there:
   ```
   git clone https://github.com/CanyonWind/MXNet-Single-Path-One-Shot-NAS.git
   ```
   2. Reproduce MXNet CPU only **without MKL**:
   ```
   pip install mxnet --pre
   cd MXNet-Single-Path-One-Shot-NAS/quantization/
   python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
   
   ...
   # output would be like:
   INFO:logger:('accuracy', 0.771875)
   INFO:logger:('top_k_accuracy_5', 0.909375)
   ```
   
   3. Reproduce **MXNet-mkl** with failed validation accuracy:
   ```sh
   # clean the previous no mkl MXNet
   pip uninstall mxnet
   ```
   
   ```sh
   pip install mxnet-mkl --pre
   cd MXNet-Single-Path-One-Shot-NAS/quantization/
   python3 imagenet_inference.py --symbol-file model/ShuffleNas_fixArch-symbol.json --param-file model/ShuffleNas_fixArch-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=5 --dataset=./data/val_256_q90.rec --ctx=cpu
   
   ...
   # output would be like:
   INFO:logger:('accuracy', 0.0)
   INFO:logger:('top_k_accuracy_5', 0.003125)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services