You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2020/11/08 09:18:00 UTC
[GitHub] [singa] chrishkchris commented on pull request #814: SINGA-511 Compatibility of fp16 to distributed optimizer
chrishkchris commented on pull request #814:
URL: https://github.com/apache/singa/pull/814#issuecomment-723550416
1. Operation mode: MPI
```
root@64926e30597f:~/dcsysh/singa/examples/cnn# mpiexec -np 3 python3 train_mpi.py cnn mnist -l 0.015 -p float16 -g
Starting Epoch 0:
Training loss = 655.268921, training accuracy = 0.765792
Evaluation accuracy = 0.929087, Elapsed Time = 2.694215s
Starting Epoch 1:
Training loss = 244.490601, training accuracy = 0.917535
Evaluation accuracy = 0.958033, Elapsed Time = 2.784006s
Starting Epoch 2:
Training loss = 174.696671, training accuracy = 0.941774
Evaluation accuracy = 0.961138, Elapsed Time = 2.736310s
Starting Epoch 3:
Training loss = 141.858124, training accuracy = 0.953058
Evaluation accuracy = 0.971755, Elapsed Time = 2.814553s
Starting Epoch 4:
Training loss = 120.392632, training accuracy = 0.959719
Evaluation accuracy = 0.976763, Elapsed Time = 2.811339s
Starting Epoch 5:
Training loss = 107.019211, training accuracy = 0.964042
Evaluation accuracy = 0.976462, Elapsed Time = 2.839776s
Starting Epoch 6:
Training loss = 96.689064, training accuracy = 0.967515
Evaluation accuracy = 0.976963, Elapsed Time = 2.825862s
Starting Epoch 7:
Training loss = 90.867180, training accuracy = 0.970186
Evaluation accuracy = 0.976562, Elapsed Time = 2.860003s
Starting Epoch 8:
Training loss = 83.686371, training accuracy = 0.972506
Evaluation accuracy = 0.979267, Elapsed Time = 2.683496s
Starting Epoch 9:
Training loss = 79.510612, training accuracy = 0.973524
Evaluation accuracy = 0.983774, Elapsed Time = 2.318410s
```
2. Operation mode: Multiprocess
```
root@64926e30597f:~/dcsysh/singa/examples/cnn# python3 train_multiprocess.py cnn mnist -l 0.015 -w 3 -p float16 -g
Starting Epoch 0:
Training loss = 655.268921, training accuracy = 0.765792
Evaluation accuracy = 0.929087, Elapsed Time = 2.780399s
Starting Epoch 1:
Training loss = 244.490601, training accuracy = 0.917535
Evaluation accuracy = 0.958033, Elapsed Time = 2.811608s
Starting Epoch 2:
Training loss = 174.696671, training accuracy = 0.941774
Evaluation accuracy = 0.961138, Elapsed Time = 2.692022s
Starting Epoch 3:
Training loss = 141.858124, training accuracy = 0.953058
Evaluation accuracy = 0.971755, Elapsed Time = 2.636105s
Starting Epoch 4:
Training loss = 120.392632, training accuracy = 0.959719
Evaluation accuracy = 0.976763, Elapsed Time = 2.783366s
Starting Epoch 5:
Training loss = 107.019211, training accuracy = 0.964042
Evaluation accuracy = 0.976462, Elapsed Time = 2.809483s
Starting Epoch 6:
Training loss = 96.689064, training accuracy = 0.967515
Evaluation accuracy = 0.976963, Elapsed Time = 2.775309s
Starting Epoch 7:
Training loss = 90.867180, training accuracy = 0.970186
Evaluation accuracy = 0.976562, Elapsed Time = 2.773959s
Starting Epoch 8:
Training loss = 83.686371, training accuracy = 0.972506
Evaluation accuracy = 0.979267, Elapsed Time = 2.768378s
Starting Epoch 9:
Training loss = 79.510612, training accuracy = 0.973524
Evaluation accuracy = 0.983774, Elapsed Time = 2.804384s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org