You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/25 15:03:55 UTC

[GitHub] [incubator-mxnet] chinakook opened a new issue #17907: Depthwise in windows is 10 times slower than linux on gpu

chinakook opened a new issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907
 
 
   ## Description
   Depthwise in windows is 10x slower than linux on gpu
   
   ### Error Message
   Windows:
   
   
   SyncCopyCPU2GPU 🔍 | 12.673 ms | 12.673 ms | 0.325 ms | 39
   -- | -- | -- | -- | --
   SyncCopyGPU2CPU 🔍 | 10.121 ms | 10.121 ms | 0.087 ms | 117
   DeleteVariable 🔍 | 1.603 ms | 1.603 ms | 0.002 ms | 1014
   BatchNorm 🔍 | 90.036 ms | 90.036 ms | 0.066 ms | 1365
   Convolution 🔍 | 3,134.914 ms | 3,134.914 ms | 1.710 ms | 1833
   Activation 🔍 | 75.352 ms | 75.352 ms | 0.055 ms | 1365
   transpose 🔍 | 30.556 ms | 30.556 ms | 0.060 ms | 507
   Flatten 🔍 | 7.007 ms | 7.007 ms | 0.015 ms | 468
   slice_axis 🔍 | 19.195 ms | 19.195 ms | 0.062 ms | 312
   _mul_scalar 🔍 | 23.824 ms | 23.824 ms | 0.056 ms | 429
   zeros_like 🔍 | 3.767 ms | 3.767 ms | 0.048 ms | 78
   where 🔍 | 3.565 ms | 3.565 ms | 0.046 ms | 78
   slice_like 🔍 | 14.537 ms | 14.537 ms | 0.062 ms | 234
   Reshape 🔍 | 12.183 ms | 12.183 ms | 0.015 ms | 819
   SliceChannel 🔍 | 5.935 ms | 5.935 ms | 0.076 ms | 78
   _plus_scalar 🔍 | 11.447 ms | 11.447 ms | 0.059 ms | 195
   exp 🔍 | 4.079 ms | 4.079 ms | 0.052 ms | 78
   Concat 🔍 | 17.887 ms | 17.887 ms | 0.066 ms | 273
   broadcast_mul 🔍 | 8.639 ms | 8.639 ms | 0.055 ms | 156
   _div_scalar 🔍 | 3.986 ms | 3.986 ms | 0.051 ms | 78
   _contrib_box_nms 🔍 | 48.896 ms | 48.896 ms | 1.254 ms | 39
   softmax 🔍 | 2.686 ms | 2.686 ms | 0.069 ms | 39
   _greater_scalar 🔍 | 2.339 ms | 2.339 ms | 0.060 ms | 39
   ones_like 🔍 | 1.908 ms | 1.908 ms | 0.049 ms | 39
   broadcast_add 🔍 | 4.345 ms | 4.345 ms | 0.056 ms | 78
   elemwise_add 🔍 | 3.568 ms | 3.568 ms | 0.046 ms | 78
   broadcast_sub 🔍 | 5.719 ms | 5.719 ms | 0.147 ms | 39
   broadcast_div 🔍 | 2.637 ms | 2.637 ms | 0.068 ms | 39
   elemwise_sub 🔍 | 3.369 ms | 3.369 ms | 0.043 ms | 78
   Totals | 3,566.773 ms | 3,566.773 ms | 0.357 ms | 9984
   
   
   
   Linux:
   
   SyncCopyGPU2CPU 🔍 | 4.928 ms | 4.928 ms | 0.042 ms | 117
   -- | -- | -- | -- | --
   SyncCopyCPU2GPU 🔍 | 11.269 ms | 11.269 ms | 0.289 ms | 39
   Activation 🔍 | 24.682 ms | 24.682 ms | 0.022 ms | 1131
   Convolution 🔍 | 198.481 ms | 198.481 ms | 0.108 ms | 1833
   BatchNorm 🔍 | 40.352 ms | 40.352 ms | 0.030 ms | 1365
   _FusedOp 🔍 | 1,188.433 ms | 1,188.433 ms | 1.524 ms | 780
   transpose 🔍 | 11.064 ms | 11.064 ms | 0.022 ms | 507
   Flatten 🔍 | 2.973 ms | 2.973 ms | 0.006 ms | 468
   softmax 🔍 | 1.125 ms | 1.125 ms | 0.029 ms | 39
   Concat 🔍 | 9.672 ms | 9.672 ms | 0.035 ms | 273
   where 🔍 | 1.262 ms | 1.262 ms | 0.016 ms | 78
   slice_axis 🔍 | 5.564 ms | 5.564 ms | 0.020 ms | 273
   zeros_like 🔍 | 0.599 ms | 0.599 ms | 0.015 ms | 39
   DeleteVariable 🔍 | 3.213 ms | 3.213 ms | 0.003 ms | 1053
   Reshape 🔍 | 0.605 ms | 0.605 ms | 0.005 ms | 117
   broadcast_mul 🔍 | 3.145 ms | 3.145 ms | 0.020 ms | 156
   broadcast_add 🔍 | 1.642 ms | 1.642 ms | 0.021 ms | 78
   broadcast_sub 🔍 | 3.018 ms | 3.018 ms | 0.077 ms | 39
   broadcast_div 🔍 | 1.526 ms | 1.526 ms | 0.039 ms | 39
   SliceChannel 🔍 | 2.744 ms | 2.744 ms | 0.035 ms | 78
   _contrib_box_nms 🔍 | 32.184 ms | 32.184 ms | 0.825 ms | 39
   _greater_scalar 🔍 | 0.840 ms | 0.840 ms | 0.022 ms | 39
   Totals | 1,549.320 ms | 1,549.320 ms | 0.181 ms | 8580
   
   
   
   ## To Reproduce
   mxnet 1.6.0 official
   predict a ssd_mobienet1.0_custom model with 300x300 on gpu
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] chinakook closed issue #17907: Depthwise in windows is 10 times slower than linux on gpu

Posted by GitBox <gi...@apache.org>.
chinakook closed issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907#issuecomment-604265587
 
 
   It's relevant to the cudnn opitimization on windows with Pascal cards.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu

Posted by GitBox <gi...@apache.org>.
chinakook commented on issue #17907: Depthwise in windows is 10 times slower than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907#issuecomment-603938898
 
 
   There is another version in windows, pytorch takes 2.0s while mxnet takes 10.2s. I think this is a bug for a long time.
   MXNET version:
   ```python
   import os
   os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
   import time
   import mxnet as mx
   from mxnet import gluon
   from mxnet.gluon import nn
   from gluoncv.model_zoo import get_model
   
   ctx = mx.gpu()
   net = get_model('mobilenetv2_1.0', norm_layer=gluon.nn.BatchNorm)
   net.initialize()
   
   net.collect_params().reset_ctx(ctx)
   
   s = time.time()
   for i in range(50):
       x = mx.nd.random.uniform(shape=(1,3,512,512), ctx=ctx)
   
       t = time.time()
       y = net(x)
       mx.nd.waitall()
       print(time.time() - t)
   
   print('TOTAL TIME: ', time.time() - s)
   ```
   ```
   0.2889983654022217
   0.15599989891052246
   0.14120268821716309
   0.1549980640411377
   0.14800024032592773
   0.1549973487854004
   0.1419973373413086
   0.16100192070007324
   0.15399909019470215
   0.14299798011779785
   0.1490001678466797
   0.17000079154968262
   0.1530005931854248
   0.14499974250793457
   0.1569969654083252
   0.15002942085266113
   0.14699625968933105
   0.14600133895874023
   0.143998384475708
   0.15400242805480957
   0.1439976692199707
   0.14451003074645996
   0.16103625297546387
   0.15851068496704102
   0.15300440788269043
   0.15399932861328125
   0.15399956703186035
   0.14400243759155273
   0.15401935577392578
   0.14500117301940918
   0.14951753616333008
   0.14799976348876953
   0.14800000190734863
   0.15600085258483887
   0.1529989242553711
   0.14699888229370117
   0.14899921417236328
   0.1512279510498047
   0.1525120735168457
   0.1549992561340332
   0.16200017929077148
   0.1529998779296875
   0.1510009765625
   0.14804387092590332
   0.14800000190734863
   0.15600061416625977
   0.15230464935302734
   0.15199899673461914
   0.14699792861938477
   0.1289997100830078
   TOTAL TIME:  10.248228788375854
   ```
   
   PYTORCH version:
   ```python
   import time
   import torch
   import torchvision
   
   torch.backends.cudnn.benchmark=False
   
   net = torchvision.models.mobilenet_v2()
   net.cuda()
   net.eval()
   
   s = time.time()
   for i in range(50):
       t = time.time()
       x = torch.rand([1,3,512,512]).cuda()
       y = net(x)
       print(time.time() - t)
   
   print('TOTAL TIME: ', time.time() - s)
   ```
   
   ```
   0.9051487445831299
   0.04097485542297363
   0.019997835159301758
   0.018999099731445312
   0.023026704788208008
   0.021998167037963867
   0.020003795623779297
   0.0209958553314209
   0.020031213760375977
   0.020966291427612305
   0.019999980926513672
   0.022031784057617188
   0.019968032836914062
   0.023028850555419922
   0.020004987716674805
   0.01996612548828125
   0.022998332977294922
   0.020999431610107422
   0.02000117301940918
   0.019997119903564453
   0.02300119400024414
   0.02200031280517578
   0.01899886131286621
   0.01999974250793457
   0.021999835968017578
   0.02000284194946289
   0.02000141143798828
   0.02000117301940918
   0.02099919319152832
   0.020000457763671875
   0.021001338958740234
   0.020998477935791016
   0.020000219345092773
   0.020998477935791016
   0.022002458572387695
   0.02502727508544922
   0.02000284194946289
   0.021997690200805664
   0.021001100540161133
   0.024999141693115234
   0.0299990177154541
   0.02599787712097168
   0.029999256134033203
   0.029999256134033203
   0.02700185775756836
   0.02520275115966797
   0.02800154685974121
   0.032999277114868164
   0.02400040626525879
   0.02900218963623047
   TOTAL TIME:  2.065340518951416
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services