You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@singa.apache.org by GitBox <gi...@apache.org> on 2020/08/12 02:22:56 UTC

[GitHub] [singa] dcslin opened a new pull request #779: half float update

dcslin opened a new pull request #779:
URL: https://github.com/apache/singa/pull/779


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-672494487


   This pull request **introduces 5 alerts** and **fixes 13** when merging 5b18677291672a206820b8d19897b79e84b070fe into c5769f1cc8a53c1e849c97641d71221bd4fdb5d4 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-a3d44a0c8fe96deb14e18efeec185a892d43c3d8)
   
   **new alerts:**
   
   * 2 for Testing equality to None
   * 1 for Duplication in regular expression character class
   * 1 for Unused local variable
   * 1 for Unused import
   
   **fixed alerts:**
   
   * 9 for Missing call to \_\_init\_\_ during object initialization
   * 2 for Unreachable code
   * 1 for Unnecessary pass
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-675610454


   This pull request **introduces 1 alert** when merging 75d8ec1eac270b3d99a889e8722a99181ca457d8 into 3f0997db042a5f0fa91732e25073f1e5afd7c6c8 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-691e3f28e68964cca2c6e4eca9ffb0c6adbb6066)
   
   **new alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-680390565


   tested below examples as a checkpoint:
   native.py with fp16
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/mlp/native.py -pfloat16
   train_data_shape: (400, 2)
   train_label_shape: (400, 2)
   training loss =  0.6914
   training loss =  0.585
   training loss =  0.5596
   training loss =  0.539
   training loss =  0.4944
   training loss =  0.4238
   training loss =  0.319
   training loss =  0.2502
   training loss =  0.2102
   training loss =  0.1869
   training loss =  0.1671
   ```
   native.py with fp32
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/mlp/native.py 
   train_data_shape: (400, 2)
   train_label_shape: (400, 2)
   training loss =  0.6908379
   training loss =  0.5781224
   training loss =  0.5531873
   training loss =  0.5157491
   training loss =  0.45046344
   training loss =  0.3674125
   training loss =  0.2854403
   training loss =  0.23216258
   training loss =  0.19450127
   training loss =  0.16646467
   training loss =  0.13695152
   ```
   
   
   module.py on fp16 with graph on
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/mlp/module.py -pfloat16 
   WARNING: Logging before InitGoogleLogging() is written to STDERR
   F0826 00:48:40.063864 34058 tensor.cc:223] Check failed: block() && block()->initialized() == true the data of the tensor needs be initialized before casting to another type
   *** Check failure stack trace: ***
   Aborted (core dumped)
   ```
   module.py on fp16 with graph off
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/mlp/module.py -pfloat16 -g
   training loss =  0.6094
   training loss =  0.5225
   training loss =  0.467
   training loss =  0.404
   training loss =  0.3582
   training loss =  0.328
   training loss =  0.3164
   training loss =  0.3086
   training loss =  0.3108
   training loss =  0.3142
   training loss =  0.3198
   ```
   module.py on fp32 with graph on
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/mlp/module.py 
   training loss =  0.61159235
   training loss =  0.5169311
   training loss =  0.43573818
   training loss =  0.34147996
   training loss =  0.26603624
   training loss =  0.21422084
   training loss =  0.17843087
   training loss =  0.15283388
   training loss =  0.13402645
   training loss =  0.11964666
   training loss =  0.10839656
   ```
   
   
   train cnn with mlp on fp16 with graph on
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py mlp mnist -m2 -pfloat16 
   Starting Epoch 0:
   WARNING: Logging before InitGoogleLogging() is written to STDERR
   F0826 00:49:13.757282 34338 tensor.cc:223] Check failed: block() && block()->initialized() == true the data of the tensor needs be initialized before casting to another type
   *** Check failure stack trace: ***
   Aborted (core dumped)
   ```
   train cnn with mlp on fp16 with graph off
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py mlp mnist -m2 -pfloat16 -g
   Starting Epoch 0:
   Training loss = 449.630493, training accuracy = 0.869180
   Evaluation accuracy = 0.921675, Elapsed Time = 3.134102s
   Starting Epoch 1:
   Training loss = 250.288086, training accuracy = 0.925110
   Evaluation accuracy = 0.937200, Elapsed Time = 3.186108s
   root@1c6aaef3db53:~/singa-hp2# 
   ```
   train cnn with mlp on fp32 with graph off
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py mlp mnist -m2 -pfloat32 -g
   Starting Epoch 0:
   Training loss = 446.399231, training accuracy = 0.870331
   Evaluation accuracy = 0.922676, Elapsed Time = 2.745227s
   Starting Epoch 1:
   Training loss = 246.745819, training accuracy = 0.926194
   Evaluation accuracy = 0.938301, Elapsed Time = 2.591690s
   ```
   
   train cnn with cnn on fp16 with graph on
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py cnn mnist -m2 -pfloat16 
   Starting Epoch 0:
   WARNING: Logging before InitGoogleLogging() is written to STDERR
   F0826 00:49:58.988692 34502 tensor.cc:223] Check failed: block() && block()->initialized() == true the data of the tensor needs be initialized before casting to another type
   *** Check failure stack trace: ***
   Aborted (core dumped)
   ```
   train cnn with cnn on fp16 with graph off
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py cnn mnist -m2 -pfloat16 -g
   Starting Epoch 0:
   Training loss = 599.249878, training accuracy = 0.788737
   Evaluation accuracy = 0.940104, Elapsed Time = 9.316158s
   Starting Epoch 1:
   Training loss = 236.738007, training accuracy = 0.920641
   Evaluation accuracy = 0.959335, Elapsed Time = 9.277672s
   ```
   train cnn with cnn on fp32 with graph off
   ```
   root@1c6aaef3db53:~/singa-hp2# PYTHONPATH=build/python/ python3 examples/cnn/train_cnn.py cnn mnist -m2 -pfloat32 -g
   Starting Epoch 0:
   Training loss = 596.964600, training accuracy = 0.789421
   Evaluation accuracy = 0.943209, Elapsed Time = 8.189669s
   Starting Epoch 1:
   Training loss = 234.664322, training accuracy = 0.920758
   Evaluation accuracy = 0.960036, Elapsed Time = 8.101694s
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin closed pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin closed pull request #779:
URL: https://github.com/apache/singa/pull/779


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin edited a comment on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin edited a comment on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-672713946


   refactored https://github.com/apache/singa/pull/775
   #### half type:
   - `c++` add `half.hpp`
   - `c++` `f16<->f32` `singa::TypeCast` (scalar)
   - `cmake` add nvcc compilation flag arch 7.0 for fp16
   #### f16 tensor:
   - creation
     - `py` f16 `tensor.from_numpy()`
   - IO
     - set value
       - `c++` `f16` to `tensor::set_value()`
       - `c++` cast DType to SType in `Tensor::SetValue`
     - copy from numpy
       - `py` f16 `tensor.copy_from_numpy()`
       - `py` raise exception to `tensor.copy_from_numpy()` for unsupported type
       - `SWIG` API `CopyHalfFloatDataFromHostPtr`
     - get value
       - `c++` `f16` `Transform`: to make contiguous
       - `c++` explicit instantiation for `CopyDataFromHostPtr<f16>`
       - `SWIG` f16 typemaps
       - `SWIG` API `GetHalfFloatValue`
   - conversion
     - `py` f16 `tensor.as_type()`
     - `c++` `f16<->f32` `cpu/cuda` CastCopy
   - arithmetic
     - `c++` `f16<->f32` `cpu/cuda` `TYPE_LANG_SWITCH`
     - `c++` `f16<->f32` `cpu/cuda` `TYPE_TYPE_LANG_SWITCH`
     - `c++` `f16` `TYPE_SWITCH`
     - `c++` cast DType to SType in `EltwiseTensorScalarFn`
     - `c++` cast DType to SType in `Div` (scalar/tensor)
   - math
     - `c++` `f16` `cpu/cuda` Gaussian
     - `c++` use `GetCudnnDataType` in `generate_tensor_nd_desc` generic type support
   #### layer
   - `layer.dtype_check` to check inputs and params are in same dtype
   #### relu
   - `c++` f16 cuda math `ReLU`
     - `cuda` f16 `KernelRelu`
     - `cuda` f16 `cuda::relu` kernel wrapper
   #### linear
   - `py` f16 init params
   - `c++` f16 cuda math `GEMM`
   #### softmax cross entropy
   - `c++` f16 cuda math `EltwiseMult`
     - `cuda` f16 `KernelMult`
     - `cuda` f16 `cuda::mult` kernel wrapper
   - `c++` f16 cuda math `ComputeCrossEntropy`
     - `cuda` f16 `KernelComputeCrossEntropy`
     - `cuda` f16 `cuda::ComputeCrossEntropy` kernel wrapper
   - `c++` f16 cuda math `SoftMax`
   #### train cnn
   - add f16 option


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-680395143


   Problem:
   - Could not train with graph, though `compile()` is ok.
   - Some fp16 operations are reusing fp32 implementation for now, but this could be fixed separately in `tensor_math_cuda.h` level.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-680461629


   Updating the design as per @chrishkchris 's advice


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-672713946


   refactored https://github.com/apache/singa/pull/775
   
   changes:
   - cpp
     - tensor impl across device
       - add `half.hpp`
       - IO
         - add `f16` to `TYPE_SWITCH`
         - add `f16<->f32` to `singa::TypeCast` (scalar)
         - add explicit instantiation for `CopyDataFromHostPtr<f16>`
         - add cast DType to SType in `Tensor::SetValue`
         - add `f16` to `tensor::set_value()`
       - math
         - add `f16<->f32` for `cpu/cuda` to `TYPE_LANG_SWITCH`
         - add `f16<->f32` for `cpu/cuda` to `TYPE_TYPE_LANG_SWITCH`
         - add cast DType to SType in `EltwiseTensorScalarFn`
         - add cast DType to SType in `Div` (scalar/tensor)
     - tensor ops impl on device
       - add `GetCudnnDataType` to `generate_tensor_nd_desc`
       - add `f16` to `Transform`
       - add `f16` on `cpu/cuda` Gaussian
       - add `f16<->f32` for `cpu/cuda` to CastCopy
   - SWIG
     - add f16 typemaps
     - add API `GetHalfFloatValue`
     - add API `CopyHalfFloatDataFromHostPtr`
   - py
     - add f16 to `tensor.as_type()`
     - add f16 to `tensor.copy_from_numpy()`
     - add raise exception to `tensor. copy_from_numpy()` for unsupported type
     - add f16 to `tensor. from_numpy()`
     - 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] dcslin edited a comment on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

dcslin edited a comment on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-680395143


   Problem:
   - Could not train with graph, though `compile()` is ok.
   - Some fp16 operations are reusing fp32 implementation for now, but this could be fixed separately in `tensor_math_cuda.h` level as follow-ups later.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [singa] lgtm-com[bot] commented on pull request #779: half float update

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #779:
URL: https://github.com/apache/singa/pull/779#issuecomment-675828752


   This pull request **introduces 1 alert** when merging 29c8b44a177c4ac6f59ea188f3678861b4814444 into 3f0997db042a5f0fa91732e25073f1e5afd7c6c8 - [view on LGTM.com](https://lgtm.com/projects/g/apache/singa/rev/pr-35a837611bbe999ebed50c66b032099b71dc8d6c)
   
   **new alerts:**
   
   * 1 for Unused local variable


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org