You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/06/28 20:31:38 UTC

[GitHub] rahul003 opened a new issue #11469: Performance regression in augmentation

rahul003 opened a new issue #11469: Performance regression in augmentation
URL: https://github.com/apache/incubator-mxnet/issues/11469
 
 
   ## Description
   There'a big performance regression (slowing down from ~5000 samples/sec to ~3000 samples/sec for Resnet50 on Imagenet. This is linked to this PR https://github.com/apache/incubator-mxnet/pull/11027 
   
   What the PR tries to do itself is not problematic, I can get 5k samples/sec with an older commit d37f3a3e63cef5a79c3e673cec30e70f8bf83b3e  on that PR from May24. But in the form it got merged in there's a big slowdown.
   
   ## Environment info (Required)
   
   Package used (Python/R/Scala/Julia): Python 3
   
   ## Build info (Required if built from source)
   pip nightly (mxnet-cu90-1.3.0b20180627) , as well as built from source from master any commit after the above PR got merged
   
   MXNet commit hash: N/A
   
   Build config: Tried with and without USE_LIBJPEG_TURBO, using that increases the speed a bit (~3500), but still much slower than before. Also enabled USE_CUDA, USE_CUDNN
   
   ## Error Message:
   N/A
   
   ## Steps to reproduce
   ```python train_imagenet.py --gpus 0,1,2,3,4,5,6,7 --batch-size 2048 --dtype float16 --network resnet-v1b --data-nthreads 40 --optimizer sgd --data-train /media/ramdisk/pass-through/train-passthrough.rec --data-train-idx /media/ramdisk/pass-through/train-passthrough.idx --data-val /media/ramdisk/pass-through/val-passthrough.rec --data-val-idx /media/ramdisk/pass-through/val-passthrough.idx ```
   
   ## What have you tried to solve it?
   I've tried to profile it and see what might be wrong with the tool perf. It looks like opencv is causing a wait for some reason. Please see figure 3 
   
   1.  Here's a perf summary now 
   ![image](https://user-images.githubusercontent.com/3457240/42059154-67db4e46-7ad7-11e8-875d-2ce64984cdfc.png)
   
   2. Perf summary from the May 24 commit 
   ![image](https://user-images.githubusercontent.com/3457240/42059170-71a59904-7ad7-11e8-83da-e7701e8dc69a.png)
   
   3. Call graph using perf
   ![image](https://user-images.githubusercontent.com/3457240/42059189-7e1170aa-7ad7-11e8-8f8a-aab7dbfddd2b.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services