You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/25 15:03:55 UTC
[GitHub] [incubator-mxnet] chinakook opened a new issue #17907: Depthwise in
windows is 10 times slower than linux on gpu
chinakook opened a new issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907
## Description
Depthwise in windows is 10x slower than linux on gpu
### Error Message
Windows:
SyncCopyCPU2GPU 🔍 | 12.673 ms | 12.673 ms | 0.325 ms | 39
-- | -- | -- | -- | --
SyncCopyGPU2CPU 🔍 | 10.121 ms | 10.121 ms | 0.087 ms | 117
DeleteVariable 🔍 | 1.603 ms | 1.603 ms | 0.002 ms | 1014
BatchNorm 🔍 | 90.036 ms | 90.036 ms | 0.066 ms | 1365
Convolution 🔍 | 3,134.914 ms | 3,134.914 ms | 1.710 ms | 1833
Activation 🔍 | 75.352 ms | 75.352 ms | 0.055 ms | 1365
transpose 🔍 | 30.556 ms | 30.556 ms | 0.060 ms | 507
Flatten 🔍 | 7.007 ms | 7.007 ms | 0.015 ms | 468
slice_axis 🔍 | 19.195 ms | 19.195 ms | 0.062 ms | 312
_mul_scalar 🔍 | 23.824 ms | 23.824 ms | 0.056 ms | 429
zeros_like 🔍 | 3.767 ms | 3.767 ms | 0.048 ms | 78
where 🔍 | 3.565 ms | 3.565 ms | 0.046 ms | 78
slice_like 🔍 | 14.537 ms | 14.537 ms | 0.062 ms | 234
Reshape 🔍 | 12.183 ms | 12.183 ms | 0.015 ms | 819
SliceChannel 🔍 | 5.935 ms | 5.935 ms | 0.076 ms | 78
_plus_scalar 🔍 | 11.447 ms | 11.447 ms | 0.059 ms | 195
exp 🔍 | 4.079 ms | 4.079 ms | 0.052 ms | 78
Concat 🔍 | 17.887 ms | 17.887 ms | 0.066 ms | 273
broadcast_mul 🔍 | 8.639 ms | 8.639 ms | 0.055 ms | 156
_div_scalar 🔍 | 3.986 ms | 3.986 ms | 0.051 ms | 78
_contrib_box_nms 🔍 | 48.896 ms | 48.896 ms | 1.254 ms | 39
softmax 🔍 | 2.686 ms | 2.686 ms | 0.069 ms | 39
_greater_scalar 🔍 | 2.339 ms | 2.339 ms | 0.060 ms | 39
ones_like 🔍 | 1.908 ms | 1.908 ms | 0.049 ms | 39
broadcast_add 🔍 | 4.345 ms | 4.345 ms | 0.056 ms | 78
elemwise_add 🔍 | 3.568 ms | 3.568 ms | 0.046 ms | 78
broadcast_sub 🔍 | 5.719 ms | 5.719 ms | 0.147 ms | 39
broadcast_div 🔍 | 2.637 ms | 2.637 ms | 0.068 ms | 39
elemwise_sub 🔍 | 3.369 ms | 3.369 ms | 0.043 ms | 78
Totals | 3,566.773 ms | 3,566.773 ms | 0.357 ms | 9984
Linux:
SyncCopyGPU2CPU 🔍 | 4.928 ms | 4.928 ms | 0.042 ms | 117
-- | -- | -- | -- | --
SyncCopyCPU2GPU 🔍 | 11.269 ms | 11.269 ms | 0.289 ms | 39
Activation 🔍 | 24.682 ms | 24.682 ms | 0.022 ms | 1131
Convolution 🔍 | 198.481 ms | 198.481 ms | 0.108 ms | 1833
BatchNorm 🔍 | 40.352 ms | 40.352 ms | 0.030 ms | 1365
_FusedOp 🔍 | 1,188.433 ms | 1,188.433 ms | 1.524 ms | 780
transpose 🔍 | 11.064 ms | 11.064 ms | 0.022 ms | 507
Flatten 🔍 | 2.973 ms | 2.973 ms | 0.006 ms | 468
softmax 🔍 | 1.125 ms | 1.125 ms | 0.029 ms | 39
Concat 🔍 | 9.672 ms | 9.672 ms | 0.035 ms | 273
where 🔍 | 1.262 ms | 1.262 ms | 0.016 ms | 78
slice_axis 🔍 | 5.564 ms | 5.564 ms | 0.020 ms | 273
zeros_like 🔍 | 0.599 ms | 0.599 ms | 0.015 ms | 39
DeleteVariable 🔍 | 3.213 ms | 3.213 ms | 0.003 ms | 1053
Reshape 🔍 | 0.605 ms | 0.605 ms | 0.005 ms | 117
broadcast_mul 🔍 | 3.145 ms | 3.145 ms | 0.020 ms | 156
broadcast_add 🔍 | 1.642 ms | 1.642 ms | 0.021 ms | 78
broadcast_sub 🔍 | 3.018 ms | 3.018 ms | 0.077 ms | 39
broadcast_div 🔍 | 1.526 ms | 1.526 ms | 0.039 ms | 39
SliceChannel 🔍 | 2.744 ms | 2.744 ms | 0.035 ms | 78
_contrib_box_nms 🔍 | 32.184 ms | 32.184 ms | 0.825 ms | 39
_greater_scalar 🔍 | 0.840 ms | 0.840 ms | 0.022 ms | 39
Totals | 1,549.320 ms | 1,549.320 ms | 0.181 ms | 8580
## To Reproduce
mxnet 1.6.0 official
predict a ssd_mobienet1.0_custom model with 300x300 on gpu
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] chinakook closed issue #17907: Depthwise in
windows is 10 times slower than linux on gpu
Posted by GitBox <gi...@apache.org>.
chinakook closed issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] chinakook commented on issue #17907: Depthwise in
windows is 10 times slower than linux on gpu
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907#issuecomment-604265587
It's relevant to the cudnn opitimization on windows with Pascal cards.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [incubator-mxnet] chinakook commented on issue #17907: Depthwise in
windows is 10 times slower than linux on gpu
Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907#issuecomment-603938898
There is another version in windows, pytorch takes 2.0s while mxnet takes 10.2s. I think this is a bug for a long time.
MXNET version:
```python
import os
os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
import time
import mxnet as mx
from mxnet import gluon
from mxnet.gluon import nn
from gluoncv.model_zoo import get_model
ctx = mx.gpu()
net = get_model('mobilenetv2_1.0', norm_layer=gluon.nn.BatchNorm)
net.initialize()
net.collect_params().reset_ctx(ctx)
s = time.time()
for i in range(50):
x = mx.nd.random.uniform(shape=(1,3,512,512), ctx=ctx)
t = time.time()
y = net(x)
mx.nd.waitall()
print(time.time() - t)
print('TOTAL TIME: ', time.time() - s)
```
```
0.2889983654022217
0.15599989891052246
0.14120268821716309
0.1549980640411377
0.14800024032592773
0.1549973487854004
0.1419973373413086
0.16100192070007324
0.15399909019470215
0.14299798011779785
0.1490001678466797
0.17000079154968262
0.1530005931854248
0.14499974250793457
0.1569969654083252
0.15002942085266113
0.14699625968933105
0.14600133895874023
0.143998384475708
0.15400242805480957
0.1439976692199707
0.14451003074645996
0.16103625297546387
0.15851068496704102
0.15300440788269043
0.15399932861328125
0.15399956703186035
0.14400243759155273
0.15401935577392578
0.14500117301940918
0.14951753616333008
0.14799976348876953
0.14800000190734863
0.15600085258483887
0.1529989242553711
0.14699888229370117
0.14899921417236328
0.1512279510498047
0.1525120735168457
0.1549992561340332
0.16200017929077148
0.1529998779296875
0.1510009765625
0.14804387092590332
0.14800000190734863
0.15600061416625977
0.15230464935302734
0.15199899673461914
0.14699792861938477
0.1289997100830078
TOTAL TIME: 10.248228788375854
```
PYTORCH version:
```python
import time
import torch
import torchvision
torch.backends.cudnn.benchmark=False
net = torchvision.models.mobilenet_v2()
net.cuda()
net.eval()
s = time.time()
for i in range(50):
t = time.time()
x = torch.rand([1,3,512,512]).cuda()
y = net(x)
print(time.time() - t)
print('TOTAL TIME: ', time.time() - s)
```
```
0.9051487445831299
0.04097485542297363
0.019997835159301758
0.018999099731445312
0.023026704788208008
0.021998167037963867
0.020003795623779297
0.0209958553314209
0.020031213760375977
0.020966291427612305
0.019999980926513672
0.022031784057617188
0.019968032836914062
0.023028850555419922
0.020004987716674805
0.01996612548828125
0.022998332977294922
0.020999431610107422
0.02000117301940918
0.019997119903564453
0.02300119400024414
0.02200031280517578
0.01899886131286621
0.01999974250793457
0.021999835968017578
0.02000284194946289
0.02000141143798828
0.02000117301940918
0.02099919319152832
0.020000457763671875
0.021001338958740234
0.020998477935791016
0.020000219345092773
0.020998477935791016
0.022002458572387695
0.02502727508544922
0.02000284194946289
0.021997690200805664
0.021001100540161133
0.024999141693115234
0.0299990177154541
0.02599787712097168
0.029999256134033203
0.029999256134033203
0.02700185775756836
0.02520275115966797
0.02800154685974121
0.032999277114868164
0.02400040626525879
0.02900218963623047
TOTAL TIME: 2.065340518951416
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services