You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/07 00:53:49 UTC

[GitHub] samhodge opened a new issue #9334: Cannot build libraries from source with cuda support on OSX with NVIDIA GeForce GTX 780M on macOS 10.12.6 using Xcode8.2 clang

samhodge opened a new issue #9334: Cannot build libraries from source with cuda support on OSX with NVIDIA GeForce GTX 780M on macOS 10.12.6 using Xcode8.2 clang
URL: https://github.com/apache/incubator-mxnet/issues/9334
 
 
   Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io 
   
   ## Description
   Cannot build libraries from source with cuda support on OSX with NVIDIA GeForce GTX 780M on macOS 10.12.6 using Xcode8.2 clang
   
   ## Environment info (Required)
   
   ```
   What to do:
   1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
   2. Run the script using `python diagnose.py` and paste its output here.
   
   ```
   HodgeFalysiMac2:mxnet hodgefamily$ python diagnose.py 
   ----------Python Info----------
   ('Version      :', '2.7.14')
   ('Compiler     :', 'GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)')
   ('Build        :', ('default', 'Oct  5 2017 02:28:52'))
   ('Arch         :', ('64bit', ''))
   ------------Pip Info-----------
   ('Version      :', '9.0.1')
   ('Directory    :', '/Users/hodgefamily/anaconda2/lib/python2.7/site-packages/pip')
   ----------MXNet Info-----------
   /Users/hodgefamily/anaconda2/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46: DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandom instead
     import OpenSSL.SSL
   ('Version      :', '1.0.0')
   ('Directory    :', '/Users/hodgefamily/anaconda2/lib/python2.7/site-packages/mxnet')
   Traceback (most recent call last):
     File "diagnose.py", line 171, in <module>
       check_mxnet()
     File "diagnose.py", line 113, in check_mxnet
       except FileNotFoundError:
   NameError: global name 'FileNotFoundError' is not defined
   
   I dont know how useful that infomation is
   
   I am trying to install C++ libraries from source with CUDA on OSX
   
   HodgeFalysiMac2:mxnet hodgefamily$ nvcc --version
   nvcc: NVIDIA (R) Cuda compiler driver
   Copyright (c) 2005-2016 NVIDIA Corporation
   Built on Tue_Jan_10_13:22:46_CST_2017
   Cuda compilation tools, release 8.0, V8.0.61
   
   HodgeFalysiMac2:mxnet hodgefamily$ /usr/bin/clang --version
   Apple LLVM version 8.0.0 (clang-800.0.42.1)
   Target: x86_64-apple-darwin16.7.0
   Thread model: posix
   InstalledDir: /Applications/Xcode8.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
   
   HodgeFalysiMac2:mxnet hodgefamily$ system_profiler SPSoftwareDataType
   Software:
   
       System Software Overview:
   
         System Version: macOS 10.12.6 (16G29)
         Kernel Version: Darwin 16.7.0
         Boot Volume: Macintosh HD
         Boot Mode: Normal
         Computer Name: Hodge Family?s iMac
         User Name: Hodge Family (hodgefamily)
         Secure Virtual Memory: Enabled
         System Integrity Protection: Disabled
         Time since boot: 6 days 21:01
   
   
       NVIDIA GeForce GTX 780M:
   
         Chipset Model: NVIDIA GeForce GTX 780M
         Type: GPU
         Bus: PCIe
         PCIe Lane Width: x16
         VRAM (Total): 4096 MB
         Vendor: NVIDIA (0x10de)
         Device ID: 0x119e
         Revision ID: 0x00a2
         ROM Revision: 3782
         Metal: Supported
         Displays:
           iMac:
             Display Type: LCD
             Resolution: 2560 x 1440
             Pixel Depth: 32-Bit Color (ARGB8888)
             Main Display: Yes
             Mirror: Off
             Online: Yes
             Rotation: Supported
             Automatically Adjust Brightness: No
             Connection Type: DisplayPort
   
   
   Package used (Python/R/Scala/Julia):
   (I'm using ...)
   
   For Scala user, please provide:
   1. Java version: (`java -version`)
   2. Maven version: (`mvn -version`)
   3. Scala runtime if applicable: (`scala -version`)
   
   For R user, please provide R `sessionInfo()`:
   
   ## Build info (Required if built from source)
   
   Compiler (Apple LLVM version 8.0.0 (clang-800.0.42.1)):
   
   MXNet commit hash:
   (a5edbf94094581ee27157eae4f2113115a3994e7)
   
   Build config:
   (Paste the content of config.mk, or the build command.)
   #-------------------------------------------------------------------------------
   #  Template configuration for compiling mxnet
   #
   #  If you want to change the configuration, please use the following
   #  steps. Assume you are on the root directory of mxnet. First copy the this
   #  file so that any local changes will be ignored by git
   #
   #  $ cp make/config.mk .
   #
   #  Next modify the according entries, and then compile by
   #
   #  $ make
   #
   #  or build in parallel with 8 threads
   #
   #  $ make -j8
   #-------------------------------------------------------------------------------
   
   #---------------------
   # choice of compiler
   #--------------------
   
   export CC = /usr/bin/clang
   export CXX = /usr/bin/clang++
   export NVCC = nvcc
   
   # whether compile with options for MXNet developer
   DEV = 0
   
   # whether compile with debug
   DEBUG = 0
   
   # the additional link flags you want to add
   ADD_LDFLAGS =
   
   # the additional compile flags you want to add
   ADD_CFLAGS =
   
   #---------------------------------------------
   # matrix computation libraries for CPU/GPU
   #---------------------------------------------
   
   # whether use CUDA during compile
   USE_CUDA = 1
   
   # add the path to CUDA library to link and compile flag
   # if you have already add them to environment variable, leave it as NONE
   # USE_CUDA_PATH = /usr/local/cuda
   USE_CUDA_PATH = /usr/local/cuda
   
   # whether use CUDNN R3 library
   USE_CUDNN = 1
   
   # whether use cuda runtime compiling for writing kernels in native language (i.e. Python)
   USE_NVRTC = 0
   
   # whether use opencv during compilation
   # you can disable it, however, you will not able to use
   # imbin iterator
   USE_OPENCV = 0
   
   # use openmp for parallelization
   USE_OPENMP = 0
   
   # choose the version of blas you want to use
   # can be: mkl, blas, atlas, openblas
   USE_BLAS = apple
   
   # whether use lapack during compilation
   # only effective when compiled with blas versions openblas/apple/atlas/mkl
   USE_LAPACK = 1
   
   # add path to intel library, you may need it for MKL, if you did not add the path
   # to environment variable
   USE_INTEL_PATH = NONE
   
   #----------------------------
   # distributed computing
   #----------------------------
   
   # whether or not to enable multi-machine supporting
   USE_DIST_KVSTORE = 0
   
   # whether or not allow to read and write HDFS directly. If yes, then hadoop is
   # required
   USE_HDFS = 0
   
   # path to libjvm.so. required if USE_HDFS=1
   LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server
   
   # whether or not allow to read and write AWS S3 directly. If yes, then
   # libcurl4-openssl-dev is required, it can be installed on Ubuntu by
   # sudo apt-get install -y libcurl4-openssl-dev
   USE_S3 = 0
   
   #----------------------------
   # additional operators
   #----------------------------
   
   # path to folders containing projects specific operators that you don't want to put in src/operators
   EXTRA_OPERATORS =
   
   
   #----------------------------
   # plugins
   #----------------------------
   
   # whether to use torch integration. This requires installing torch.
   # TORCH_PATH = $(HOME)/torch
   # MXNET_PLUGINS += plugin/torch/torch.mk
   USE_BLAS = openblas
   ADD_CFLAGS += -I/usr/local/opt/openblas/include
   ADD_LDFLAGS += -L/usr/local/opt/openblas/lib
   ADD_LDFLAGS += -L/usr/local/lib/graphviz/
   USE_CPP_PACKAGE=1
   
   ## Error Message:
   (Paste the complete error message, including stack trace.)
   HodgeFalysiMac2:mxnet hodgefamily$ make 
   cd /Volumes/Hodge/dev/mxnet/dmlc-core; /Applications/Xcode8.2.app/Contents/Developer/usr/bin/make libdmlc.a USE_SSE=1 config=/Volumes/Hodge/dev/mxnet/config.mk; cd /Volumes/Hodge/dev/mxnet
   make[1]: `libdmlc.a' is up to date.
   /usr/local/cuda/bin/nvcc -std=c++11 -Xcompiler -D_FORCE_INLINES -O3 -ccbin /usr/bin/clang++    -Xcompiler "-DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/Volumes/Hodge/dev/mxnet/mshadow/ -I/Volumes/Hodge/dev/mxnet/dmlc-core/include -fPIC -I/Volumes/Hodge/dev/mxnet/nnvm/include -I/Volumes/Hodge/dev/mxnet/dlpack/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=0 -DMSHADOW_USE_CUDNN=1  -I/usr/local/opt/openblas/include -I/Volumes/Hodge/dev/mxnet/cub -DMXNET_USE_NVRTC=0" -M -MT build/src/operator/convolution_gpu.o src/operator/convolution.cu >build/src/operator/convolution_gpu.d
   nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
   /usr/local/cuda/bin/nvcc -c -o build/src/operator/convolution_gpu.o -std=c++11 -Xcompiler -D_FORCE_INLINES -O3 -ccbin /usr/bin/clang++    -Xcompiler "-DMSHADOW_FORCE_STREAM -Wall -Wsign-compare -O3 -DNDEBUG=1 -I/Volumes/Hodge/dev/mxnet/mshadow/ -I/Volumes/Hodge/dev/mxnet/dmlc-core/include -fPIC -I/Volumes/Hodge/dev/mxnet/nnvm/include -I/Volumes/Hodge/dev/mxnet/dlpack/include -Iinclude -funroll-loops -Wno-unused-variable -Wno-unused-parameter -Wno-unknown-pragmas -Wno-unused-local-typedefs -msse3 -I/usr/local/cuda/include -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -DMSHADOW_USE_PASCAL=0 -DMXNET_USE_OPENCV=0 -DMSHADOW_USE_CUDNN=1  -I/usr/local/opt/openblas/include -I/Volumes/Hodge/dev/mxnet/cub -DMXNET_USE_NVRTC=0" src/operator/convolution.cu
   nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=8]" 
   (688): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=4]" 
   (691): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=32]" 
   (648): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=16]" 
   (651): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8, kFilterHeight=3, kFilterWidth=3]" 
   (626): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(479): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   src/operator/./depthwise_convolution_tf.cuh(507): error: identifier "__shfl_down" is undefined
             detected during:
               instantiation of "void tf::depthwise_conv::cuda::DepthwiseConv2dBackwardFilterKernelSmall<DType,kBlockSlices,kAccumPixels,kFilterHeight,kFilterWidth>(tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8, kFilterHeight=-1, kFilterWidth=-1]" 
   (630): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices,kAccumPixels>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2, kAccumPixels=8]" 
   (654): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall<DType,kBlockSlices>(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, int, const DType *, const DType *, DType *) [with DType=float, kBlockSlices=2]" 
   (694): here
               instantiation of "__nv_bool tf::depthwise_conv::TryLaunchDepthwiseConv2dBackwardFilterGPUSmall(mshadow::Stream<mshadow::gpu> *, tf::depthwise_conv::DepthwiseArgs, const DType *, const DType *, DType *) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(253): here
               instantiation of "void mxnet::op::depthwise_conv::DepthwiseConv2dBackwardFilterGpu<DType>(mshadow::Stream<mshadow::gpu> *, const tf::depthwise_conv::DepthwiseArgs &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(334): here
               instantiation of "void mxnet::op::DepthwiseConvolutionOp<DType>::Backward(const mxnet::OpContext &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::OpReqType, std::__1::allocator<mxnet::OpReqType>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &, const std::__1::vector<mxnet::TBlob, std::__1::allocator<mxnet::TBlob>> &) [with DType=float]" 
   src/operator/./depthwise_convolution-inl.h(59): here
               instantiation of "mxnet::op::DepthwiseConvolutionOp<DType>::DepthwiseConvolutionOp(const mxnet::op::ConvolutionParam &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &, const std::__1::vector<nnvm::TShape, std::__1::allocator<nnvm::TShape>> &) [with DType=float]" 
   src/operator/convolution.cu(58): here
   
   36 errors detected in the compilation of "/var/folders/zs/mq_kb11n6fxg023lnv4ndjch0000gn/T//tmpxft_0001852d_00000000-7_convolution.cpp1.ii".
   make: *** [build/src/operator/convolution_gpu.o] Error 2
   
   ## Minimum reproducible example
   (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
   
   make
   
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.download the source recursively using git
   2.edit the config.mk
   3. make -j$(sysctl -n hw.ncpu)
   
   I am wanting to use the C++ api and many of the examples use the GPU
   
   ## What have you tried to solve it?
   
   1. compiled without CUDA support (builds fine)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services