You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2019/05/05 23:52:31 UTC
[GitHub] [incubator-mxnet] stu1130 opened a new pull request #14884: [Dependency Update] Upgrade cuDNN & NCCL

stu1130 opened a new pull request #14884: [Dependency Update] Upgrade cuDNN & NCCL
URL: https://github.com/apache/incubator-mxnet/pull/14884
 
 
   ## Description ##
   Upgrade the CUDA 9.0/9.2/10.0 with latest cuDNN **7.5.1** & NCCL **2.4.2**
   
   ## Checklist ##
   Run three models ResNet50 with ImageNet & LSTM with PTB & MLP with MNIST
   Performance shown below
   Environment: P3.16xlarge Deep Learning Base AMI
   The unit above is **samples/per second**
   ### ResNet ###
   **model**: Resnet50
   **dataset**: Imagenet
   **number of gpu**: 8
   **epochs**: 3 (only to test throughput)
   **preprocess command**: sudo pip install gluoncv==0.2.0b20180625
   **command**: python mxnet_benchmark/train_imagenet.py --use-rec --batch-size 128 --dtype float32 —num-data-workers 40 —num-epochs 3 —gpus 0,1,2,3,4,5,6,7 --lr 0.05 --last-gamma —mode symbolic —model resnet50_v1b —rec-train /home/ubuntu/data/train-passthrough.rec —rec-train-idx /home/ubuntu/data/train-passthrough.idx —rec-val /home/ubuntu/data/val-passthrough.rec —rec-val-idx /home/ubuntu/data/val-passthrough.idx
   **github repo**: https://github.com/rahul003/deep-learning-benchmark-mirror.git*
   
   | Throughput Tables   |      cuDNN 7.5.1/NCCL 2.4.2     | cuDNN 7.3.1/NCCL 2.3.4 | Perforamnce Difference|
   |:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10 | 2831.54405 | 2821.9832 | 0.339%  |
   | CUDA 9.2 | 2832.36803 | 2843.28968 | -0.384% |
   | CUDA 9.0| 2815.83939 | 2851.92915 | -1.265% | 
   
   **There is another performance regression with --batch-size 256 --dtype float16 --mode hybrid, please find more details on #14838
   
   ### LSTM ###
   **model**: LSTM
   **dataset**: PTB(Penn Treebank)
   **number of gpu**: 1
   **epochs**: 10
   **command**:
   python2 benchmark_driver.py --framework mxnet --task-name mkl_lstm_ptb_symbolic --num-gpus 1 --epochs 10 --metrics-suffix test --kvstore local
   python word_language_model/lstm_bucketing.py —num-hidden 650 —num-embed 650 —gpus 0 --epochs 10 --kv-store local
   
   | Throughput Tables   |      cuDNN 7.5.1/NCCL 2.4.2     | cuDNN 7.3.1/NCCL 2.3.4 | Perforamnce Difference|
   |:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10 | 847.98222 | 868.28966 | -2.339%  |
   | CUDA 9.2 | 1005.25185 | 1051.06692 | -4.359% |
   | CUDA 9.0| 1002.59081 | 1028.46962 | -1.265% | 
   
   **The CUDA 10 have a performance regression issue, please see #14725 to find more details.**
   
   ### MLP ###
   **model**: 3 dense layers with num_hidden=64 and relu as activation
   **dataset**: MNIST
   **number of gpu**: 1
   **epochs**: 10
   **command**:
   python2 benchmark_runner.py —framework mxnet —metrics-policy mlp —task-name mlp —metrics-suffix test —num-gpus 1 —command-to-execute 'python3 mlp.py' —data-set mnist
   
   | Throughput Tables   |      cuDNN 7.5.1/NCCL 2.4.2     | cuDNN 7.3.1/NCCL 2.3.4 | Perforamnce Difference|
   |:----------|:------------------------:|:--------------------:|:---------------------:|
   | CUDA 10 | 4192.20685 | 4094.76838 | 2.38%  |
   | CUDA 9.2 | 4212.68214 | 4280.69164 | -1.589% |
   | CUDA 9.0| 4232.10159 | 4273.43268 | -0.967%| 
   
   ## Comments ##
   @szha @lanking520 @eric-haibin-lin 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services