You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2019/11/19 10:23:36 UTC

[GitHub] [singa] chrishkchris opened a new pull request #562: SINGA-487 Add support of gradient compression to half precision

chrishkchris opened a new pull request #562: SINGA-487 Add support of gradient compression to half precision
URL: https://github.com/apache/singa/pull/562
 
 
   In this PR, I add an API in opt.py for using half precision in gradient transfer.
   
   Here is the training accurate test using 16 bit for gradient transfer:
   ubuntu@ip-172-31-29-33:~/singa/examples/autograd$ /home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 mnist_dist.py
   Starting Epoch 0:
   Training loss = 790.405762, training accuracy = 0.715144
   Evaluation accuracy = 0.928557, Elapsed Time = 0.675930s
   Starting Epoch 1:
   Training loss = 252.329041, training accuracy = 0.915181
   Evaluation accuracy = 0.961143, Elapsed Time = 0.545467s
   Starting Epoch 2:
   Training loss = 181.895905, training accuracy = 0.938618
   Evaluation accuracy = 0.965461, Elapsed Time = 0.554351s
   Starting Epoch 3:
   Training loss = 136.416214, training accuracy = 0.954577
   Evaluation accuracy = 0.970806, Elapsed Time = 0.542592s
   Starting Epoch 4:
   Training loss = 117.712143, training accuracy = 0.960804
   Evaluation accuracy = 0.976460, Elapsed Time = 0.543181s
   Starting Epoch 5:
   Training loss = 102.698730, training accuracy = 0.965562
   Evaluation accuracy = 0.976974, Elapsed Time = 0.541852s
   Starting Epoch 6:
   Training loss = 93.638481, training accuracy = 0.969401
   Evaluation accuracy = 0.978207, Elapsed Time = 0.543727s
   Starting Epoch 7:
   Training loss = 88.651802, training accuracy = 0.970536
   Evaluation accuracy = 0.975123, Elapsed Time = 0.541136s
   Starting Epoch 8:
   Training loss = 80.523178, training accuracy = 0.973508
   Evaluation accuracy = 0.983244, Elapsed Time = 0.544187s
   Starting Epoch 9:
   Training loss = 76.868576, training accuracy = 0.974209
   Evaluation accuracy = 0.982113, Elapsed Time = 0.544531s
   
   There seems to be no different of training accuracy in mnist dataset. But for other more complex network/dataset I added an option of gradient clipping to assist training.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services