You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/03/25 13:17:05 UTC
[GitHub] [incubator-mxnet] xrmzju opened a new issue #14516: cann't achive linear speed up with multi GPUs

xrmzju opened a new issue #14516: cann't achive linear speed up with multi GPUs 
URL: https://github.com/apache/incubator-mxnet/issues/14516
 
 
   Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.
   
   For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io 
   
   ## Description
   i'm trying to run imagenet benchmark with resnet18 && cifar10( use 3,32,32 imageshape instead),  here is my results.  i'm wondering:
   1) why the speedup is not linear? 
   2) the speed up ratio increases with the batch size, why?
   3) What factors will affect the training speed?
   
   batch size | kv_store | dtype | num of gpus | speed | speedup ratio
   -- | -- | -- | -- | -- | --
   256 | device | float16 | 1 | 6982.74 | 1
     |   |   | 2 | 9908.78 | 0.709519472
     |   |   | 4 | 9128.08 | 0.460605645
     |   | float32 | 1 | 4192.01 | 1
     |   |   | 2 | 6686.33 | 0.797508832
     |   |   | 4 | 7094.3 | 0.530507767
     | local | float16 | 1 | 6866.89 |  
     |   |   | 2 | 7995.19 | 0.582155095
     |   |   | 4 | 5417.86 | 0.197245769
     |   | float32 | 1 | 4216.72 | 1
     |   |   | 2 | 6814.4 | 0.808021401
     |   |   | 4 | 6005.8 | 0.356070595
   512 | device | float16 | 1 | 7827.6 | 1
     |   |   | 2 | 13258 | 0.84687516
     |   |   | 4 | 13203 | 0.421680975
     |   | float32 | 1 | 4274.25 | 1
     |   |   | 2 | 8046.32 | 0.941255191
     |   |   | 4 | 9978.2 | 0.583622858
     | local | float16 | 1 | 7685.1 | 1
     |   |   | 2 | 12535.5 | 0.815571691
     |   |   | 4 | 10960.6 | 0.356553591
     |   | float32 | 1 | 4220.42 | 1
     |   |   | 2 | 8058.09 | 0.954654987
     |   |   | 4 | 10418.9 | 0.617171988
   1024 | device | float16 | 1 | 5465.05 | 1
     |   |   | 2 | 15415.4 | 1.410362211
     |   |   | 4 | 21156.9 | 0.967827376
     |   | float32 | 1 | 3835.76 | 1
     |   |   | 2 | 8428.88 | 1.410362211
     |   |   | 4 | 12995.3 | 0.967827376
     | local | float16 | 1 | 5473.84 | 1
     |   |   | 2 | 15174.3 | 1.386074492
     |   |   | 4 | 18561.5 | 0.847736689
     |   | float32 | 1 | 3830.2 | 1
     |   |   | 2 | 8426.37 | 1.099990862
     |   |   | 4 | 14106.9 | 0.920767845
   2048 | device | float16 | 1 | 4515.36 | 1
     |   |   | 2 | 10938 | 1.211199107
     |   |   | 4 | 28013.4 | 1.5510059
     |   | float32 | 1 | 3387.51 | 1
     |   |   | 2 | 7633.34 | 1.211199107
     |   |   | 4 | 15375.4 | 1.5510059
     | local | float16 | 1 | 4494.73 | 1
     |   |   | 2 | 10821.7 | 1.203820919
     |   |   | 4 | 25833.7 | 1.436888311
     |   | float32 | 1 | 3382.08 | 1
     |   |   | 2 | 7639.2 | 1.129364178
     |   |   | 4 | 16075.5 | 1.18828502
   
   ## Environment info (Required)
   ```shell
   ----------Python Info----------
   Version      : 3.5.2
   Compiler     : GCC 5.4.0 20160609
   Build        : ('default', 'Nov 12 2018 13:43:14')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 19.0.3
   Directory    : /usr/local/lib/python3.5/dist-packages/pip
   ----------MXNet Info-----------
   Version      : 1.4.0
   Directory    : /usr/local/lib/python3.5/dist-packages/mxnet
   Commit Hash   : a03d59ed867ba334d78d61246a1090cd1868f5da
   ----------System Info----------
   Platform     : Linux-4.1.51-x86_64-with-Ubuntu-16.04-xenial
   system       : Linux
   node         : mxnet-no-nvlink-4-74c6599dc6-f854z
   release      : 4.1.51
   version      : #1 SMP Tue Feb 12 00:00:00 UTC 2019
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                16
   On-line CPU(s) list:   0-15
   Thread(s) per core:    1
   Core(s) per socket:    1
   Socket(s):             16
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 79
   Model name:            Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
   Stepping:              1
   CPU MHz:               2599.996
   BogoMIPS:              5199.99
   Hypervisor vendor:     KVM
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              4096K
   L3 cache:              16384K
   NUMA node0 CPU(s):     0-15
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch arat fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
   ----------Network Test----------
   Setting timeout: 10
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0346 sec, LOAD: 2.9720 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 3.3387 sec, LOAD: 1.8584 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 1.2275 sec, LOAD: 0.8845 sec.
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0022 sec, LOAD: 0.9994 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0532 sec, LOAD: 2.3628 sec.
   Error open PYPI: https://pypi.python.org/pypi/pip, <urlopen error _ssl.c:629: The handshake operation timed out>, DNS finished in 0.029279708862304688 sec.
   ```
   ## Steps to reproduce
   (Paste the commands you ran that produced the error.)
   
   1.  run command
   ```shell
   python train_imagenet.py --image-shape 3,32,32 --batch-size 512 --gpus 0,1 --num-epochs 1 --benchmark 1 --dtype=float32 --kv-store=device --num-layers 18 --network resnet
   ```
   2. change parameters above
   
   ## What have you tried to solve it?
   
   i tried to increase the image-shape to 3,224,224,  and the results becomes much more better, but i'm still wondering why the speed up ratio would be bigger than 1?
   
   batch   size | kv_store | dtype | num of gpus | speed | speedup ratio
   -- | -- | -- | -- | -- | --
   256 | device | float16 | 1 | 1389.26 |  
     |   |   | 2 | 2923.71 | 1.052254
     |   |   | 4 | 4735.49 | 0.852161
     |   | float32 | 1 | 897.35 |  
     |   |   | 2 | 1901.02 | 1.059241
     |   |   | 4 | 3152.52 | 0.878286
     | local | float16 | 1 | 1390.71 | 1
     |   |   | 2 | 2921.01 | 1.050187
     |   |   | 4 | 4033.09 | 0.725006
     |   | float32 | 1 | 899.573 | 1
     |   |   | 2 | 1904.76 | 1.058702
     |   |   | 4 | 3216.54 | 0.893907
   512 | device | float16 | 1 | 1094.68 | 1
     |   |   | 2 | 2722.44 | 1.243487
     |   |   | 4 | 5097.52 | 1.164158
     |   | float32 | 1 | 830.413 | 1
     |   |   | 2 | 1795.82 | 1.081281
     |   |   | 4 | 3578.45 | 1.07731
     | local | float16 | 1 | 1094.4 | 1
     |   |   | 2 | 2720.68 | 1.243001
     |   |   | 4 | 5066.31 | 1.157326
     |   | float32 | 1 | 828.245 | 1
     |   |   | 2 | 1801.51 | 1.087547
     |   |   | 4 | 3572.59 | 1.078361
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services