You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/01 04:19:38 UTC

[GitHub] edfall opened a new issue #14293: SOFTMAX is slow

edfall opened a new issue #14293: SOFTMAX is slow
URL: https://github.com/apache/incubator-mxnet/issues/14293
 
 
   
   ## Description
   Trying to implement an attention model with mxnet, a lot of softmax operation is used.
   I find softmax operation is `too slow`, slower than a convolution and other platform softmax implementation, such as torch.
   
   ## Minimum reproducible example
   - code:
   ```python
   print('mxnet version: {}'.format(mx.__version__))
   print('torch version: {}'.format(torch.__version__))
   
   #  mxnet version: 1.5.0
   # torch version: 0.4.0
   
   n,c,h =512, 128, 512
   
   weight = mx.nd.random.randn(128, 128, 3, ctx=mx.gpu(0))
   value = mx.nd.random.randn(n, c, h, ctx =mx.gpu(0))
   print('mxnet softmax: ')
   %timeit a = mx.nd.softmax(value, axis=1).wait_to_read()
   print('mxnet conv1d: ')
   %timeit a = mx.nd.Convolution(value, weight, kernel=3, num_filter=128, no_bias=True).wait_to_read()
   
   value_t = torch.randn(n,c,h)
   weight_t = torch.randn(128, 128, 3)
   value_t = value_t.cuda(0)
   weight_t = weight_t.cuda(0)
   print('torch softmax: ')
   %timeit a = torch.nn.functional.softmax(value_t, 1)
   print('torch conv1d: ')
   %timeit b = torch.nn.functional.conv1d(value_t, weight_t)
   
   n,c,h =512, 32, 512
   
   weight = mx.nd.random.randn(32, 32, 3, ctx=mx.gpu(0))
   value = mx.nd.random.randn(n, c, h, ctx =mx.gpu(0))
   print('mxnet softmax: ')
   %timeit a = mx.nd.softmax(value, axis=1).wait_to_read()
   print('mxnet conv1d: ')
   %timeit a = mx.nd.Convolution(value, weight, kernel=3, num_filter=32, no_bias=True).wait_to_read()
   
   value_t = torch.randn(n,c,h)
   weight_t = torch.randn(32, 32, 3)
   value_t = value_t.cuda(0)
   weight_t = weight_t.cuda(0)
   print('torch softmax: ')
   %timeit a = torch.nn.functional.softmax(value_t, 1)
   print('torch conv1d: ')
   %timeit b = torch.nn.functional.conv1d(value_t, weight_t)
   
   ```
   - output:
   
   ```
   # channel 128
   mxnet softmax: 5.98 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   mxnet conv1d:  2.96 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   torch softmax:  1.7 ms ± 843 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
   torch conv1d:  5.69 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   #channel 32
   mxnet softmax:  4.41 ms ± 31.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
   mxnet conv1d:  519 µs ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   torch softmax: 365 µs ± 384 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
   torch conv1d: 302 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
   ```
   - softmax operation is consistently slower(2-8X) than convolution,  even worse when softmax channel is low(8X).
   
   - compared with torch, softmax is 3-12X slower in mxnet.
   
   - so, I guess softmax here is **really slow**. 
   
   - I also find dropout is slow, i think this two issue may be related. https://github.com/apache/incubator-mxnet/issues/13825 #

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services