You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/02/21 02:52:03 UTC

[GitHub] DickJC123 commented on issue #14200: Bulked op segments to allow Variable nodes

DickJC123 commented on issue #14200: Bulked op segments to allow Variable nodes
URL: https://github.com/apache/incubator-mxnet/pull/14200#issuecomment-465841317
 
 
   I measured the perf gains of this PR under 2 scenarios.
   
   The first scenario was a run across 8 Volta GPUs of a mixed precision NHWC Resnet50 v1b (also with horovod and DALI in NVIDIA's MXNet container).  To simulate upstream MXNet with it's current bulking, I set the new-PR's bulking to 2.  I reasoned that a residual unit has a forward sequence Conv-BN-Relu-Conv-BN-Add-Relu, which would have 4 segments (due to the Variable inputs to the Conv's and BN's) amongst 7 nodes, for an average nodes/segment of 1.75.  The speed-up measured from MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN=2 to MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN=15 was:
   
   ```
   batchsize  32: 3.8% speedup
   batchsize  64: 5.1% speedup
   batchsize 128: 1.9% speedup
   batchsize 256: 1.2% speedup
   ```
   
   A second scenario was a run across 8 Volta GPUs of a mixed precision NCHW Inception-v3 (with DALI in NVIDIA's MXNet container).   The speed-up measured from MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN=2 to MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN=15 was:
   
   ```
   batchsize  32: 6.3% speedup
   batchsize  64: 7.4% speedup
   batchsize 128: 3.8% speedup
   batchsize 256: 2.1% speedup
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services