You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2017/11/27 22:00:29 UTC

[GitHub] ptrendx commented on issue #8751: Distributed Training has inverse results when imported (8 GPUS is slower than 1!)

ptrendx commented on issue #8751: Distributed Training has inverse results when imported (8 GPUS is slower than 1!)
URL: https://github.com/apache/incubator-mxnet/issues/8751#issuecomment-347343030
 
 
   @SumNeuron Hi, I am from the NVDIA frameworks team. (In the future, please use the official DGX customer support channels to reach us, I noticed this issue only by chance).
   
   Your script in file 2/file 3 has a major difference compared to file 1 - you do not change the batch size when going to multiple GPUs. In file 1 you use batch size of 512 for 8 GPUs (64 per GPU), whereas in file 2/file 3 it is set to 64 total (8 per GPU). This results in underutilization of GPUs and makes the computation really inefficient. When I changed batch_size in file 3 to 512 I got comparable results (~2s per epoch) to file 1.
   
   I can't reproduce the difference in time for a single GPU tests between file 1 and file 2 (I get ~2s in both cases). Which version of the container are you using? 17.11 or earlier?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services