You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/24 03:08:46 UTC

[GitHub] gzh1991 opened a new issue #9537: float16 have no performance improvemen in training cifar10

gzh1991 opened a new issue #9537: float16 have no performance improvemen in training cifar10
URL: https://github.com/apache/incubator-mxnet/issues/9537
 
 
    I test the example in image_classfication to test the performance imporvement when using float16 in train phase,but get the opposite results. The parameters I use and the coresponding results are listed below:
    1. python3 train_cifar10.py --gpus 0 --disp-batches 10 --batch-size 1024 --data-nthread 4 --dtype float32
   
   [10:21:53] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_train.rec, use 3 threads for decoding..
   [10:21:53] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_val.rec, use 3 threads for decoding..
   [10:21:54] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   INFO:root:Epoch[0] Batch [10]   Speed: 2087.23 samples/sec      accuracy=0.119940
   INFO:root:Epoch[0] Batch [20]   Speed: 2085.22 samples/sec      accuracy=0.181738
   INFO:root:Epoch[0] Batch [30]   Speed: 2081.76 samples/sec      accuracy=0.222070
   INFO:root:Epoch[0] Batch [40]   Speed: 2082.71 samples/sec      accuracy=0.237402
   
   2.python3 train_cifar10.py --gpus 0 --disp-batches 10 --batch-size 1024 --data-nthread 4 --dtype float16
   
   [10:23:57] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_train.rec, use 3 threads for decoding..
   [10:23:57] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_val.rec, use 3 threads for decoding..
   [10:23:58] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   INFO:root:Epoch[0] Batch [10]   Speed: 1570.73 samples/sec      accuracy=0.126332
   INFO:root:Epoch[0] Batch [20]   Speed: 1571.33 samples/sec      accuracy=0.197168
   INFO:root:Epoch[0] Batch [30]   Speed: 1568.59 samples/sec      accuracy=0.233301
   INFO:root:Epoch[0] Batch [40]   Speed: 1570.66 samples/sec      accuracy=0.271680
   
   I want to known is this the expected results or just because of 1080 Ti itself or some other reasons.
   
   ## Environment info (Required)
   MXNET 1.0.0
   GPU Getforce 1080 Ti
   CUDA 8.0
   MXNET_CUDNN_AUTOTUNE_DEFAULT=1 
   CUDNN 5.1
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services