You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/21 02:29:31 UTC

[GitHub] [incubator-mxnet] QueensGambit commented on issue #8832: I didn't get a reasonable speed-up when applying depthwise convolution to VGG16

QueensGambit commented on issue #8832: I didn't get a reasonable speed-up when applying depthwise convolution to VGG16
URL: https://github.com/apache/incubator-mxnet/issues/8832#issuecomment-513513890
 
 
   Hi @lawrencewxj @edmBernard 
   For a certain model architecture which heavily uses depthwise separable convolutions, I achieved a 1.4x speed-up on GPU ( MXNET-cu10 1.4.1,CUDA 10.0, cuDNN v7.5.1.10), but a 3x speed-up on CPU (MXNET-mkl 1.4.1).
   The main reason for this is because grouped convolutions cause memory fraction and are naturally not well suited for GPUs.
   
   The following paper conducted an experiment on this (see Table 2).
   **ShuffleNet V2: Practical Guidelines for EfficientCNN Architecture Design** (Ma et al., 2018)
   * https://arxiv.org/pdf/1807.11164.pdf
   
   Recent cuDNN-Versions are improving the performance for group-convolutions:
   * https://docs.nvidia.com/deeplearning/sdk/pdf/cuDNN-Release-Notes.pdf
   
   It could be good to verify that MXNet makes use of all these recent optimizations.
   
   Best,
   ~QueensGambit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services