You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/01/24 03:08:46 UTC
[GitHub] gzh1991 opened a new issue #9537: float16 have no performance improvemen in training cifar10
gzh1991 opened a new issue #9537: float16 have no performance improvemen in training cifar10
URL: https://github.com/apache/incubator-mxnet/issues/9537
I test the example in image_classfication to test the performance imporvement when using float16 in train phase,but get the opposite results. The parameters I use and the coresponding results are listed below:
1. python3 train_cifar10.py --gpus 0 --disp-batches 10 --batch-size 1024 --data-nthread 4 --dtype float32
[10:21:53] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_train.rec, use 3 threads for decoding..
[10:21:53] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_val.rec, use 3 threads for decoding..
[10:21:54] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [10] Speed: 2087.23 samples/sec accuracy=0.119940
INFO:root:Epoch[0] Batch [20] Speed: 2085.22 samples/sec accuracy=0.181738
INFO:root:Epoch[0] Batch [30] Speed: 2081.76 samples/sec accuracy=0.222070
INFO:root:Epoch[0] Batch [40] Speed: 2082.71 samples/sec accuracy=0.237402
2.python3 train_cifar10.py --gpus 0 --disp-batches 10 --batch-size 1024 --data-nthread 4 --dtype float16
[10:23:57] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_train.rec, use 3 threads for decoding..
[10:23:57] src/io/iter_image_recordio_2.cc:170: ImageRecordIOParser2: data/cifar10_val.rec, use 3 threads for decoding..
[10:23:58] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [10] Speed: 1570.73 samples/sec accuracy=0.126332
INFO:root:Epoch[0] Batch [20] Speed: 1571.33 samples/sec accuracy=0.197168
INFO:root:Epoch[0] Batch [30] Speed: 1568.59 samples/sec accuracy=0.233301
INFO:root:Epoch[0] Batch [40] Speed: 1570.66 samples/sec accuracy=0.271680
I want to known is this the expected results or just because of 1080 Ti itself or some other reasons.
## Environment info (Required)
MXNET 1.0.0
GPU Getforce 1080 Ti
CUDA 8.0
MXNET_CUDNN_AUTOTUNE_DEFAULT=1
CUDNN 5.1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services