You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/07/21 02:29:31 UTC
[GitHub] [incubator-mxnet] QueensGambit commented on issue #8832: I didn't
get a reasonable speed-up when applying depthwise convolution to VGG16
QueensGambit commented on issue #8832: I didn't get a reasonable speed-up when applying depthwise convolution to VGG16
URL: https://github.com/apache/incubator-mxnet/issues/8832#issuecomment-513513890
Hi @lawrencewxj @edmBernard
For a certain model architecture which heavily uses depthwise separable convolutions, I achieved a 1.4x speed-up on GPU ( MXNET-cu10 1.4.1,CUDA 10.0, cuDNN v7.5.1.10), but a 3x speed-up on CPU (MXNET-mkl 1.4.1).
The main reason for this is because grouped convolutions cause memory fraction and are naturally not well suited for GPUs.
The following paper conducted an experiment on this (see Table 2).
**ShuffleNet V2: Practical Guidelines for EfficientCNN Architecture Design** (Ma et al., 2018)
* https://arxiv.org/pdf/1807.11164.pdf
Recent cuDNN-Versions are improving the performance for group-convolutions:
* https://docs.nvidia.com/deeplearning/sdk/pdf/cuDNN-Release-Notes.pdf
It could be good to verify that MXNet makes use of all these recent optimizations.
Best,
~QueensGambit
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services