You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/09 05:57:54 UTC

[GitHub] [incubator-mxnet] zixuanweeei opened a new pull request #18001: [MKLDNN] Support quantized rnn

zixuanweeei opened a new pull request #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001

## Description ##
In this PR, we add support of quantization flow of the rnn operator. Currently, only the LSTM mode supports INT8 inference.

## Checklist ##
### Essentials ###
Please feel free to remove inapplicable items for your PR.
- [x] Changes are complete (i.e. I finished coding on this PR)
- [x] All changes have test coverage:
- Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
- Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
- Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
- [x] Code is well-documented:
- For user-facing API changes, API doc string has been updated.
- For new C++ functions in header files, their functionalities and arguments are documented.
- For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
- Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
- [x] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

### Changes ###
- [x] Add _contrib_quantized_rnn op.
- [x] Add asymmetric quantization - _contrib_quantized_asym op, to quantize FP32 data to U8 data using scale and shift.
- [x] Add MXNET_USE_WEIGHT_CACHE to control rnn init behavior.
- [x] Support data layout in NDArrayIter. Specifically, NDArrayIter supports only `NCHW` layout by default, and there is no way to support other layouts, like sequential `TNC` layout. This PR makes some changes to NDArrayIter to leverage the feature (assuming that N represents the batch).
- [x] Move MKLDNNRnnMemMgr to individual layer.

@ciyongch @TaoLv @pengzhao-intel

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-614380958
 
 
   @zixuanweeei could you rebase and resolve the conflict?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

eric-haibin-lin commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-613666611
 
 
   Is there a plan to improve `log_softmax` on cpu? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ciyongch commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

ciyongch commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-613757343
 
 
   @eric-haibin-lin we'll enable DNNL primitive for log_softmax to improve its performance on CPU, but not in this PR:)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-614388110
 
 
   > @zixuanweeei could you rebase and resolve the conflict?
   
   Currently, we are focusing on adding the feature on v1.6.x branch, as well as the quantized LSTMP operator. I will port the changes there to this PR soon. Thanks for the reminder.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-612741705
 
 
   > what's the performance?
   
   We have verified the accuracy and performance using a pre-trained language model provided by gluon-nlp ([a link](https://gluon-nlp.mxnet.io/examples/language_model/language_model.html#Using-a-pre-trained-AWD-LSTM-language-model)).
   
   ### Accuracy (PPL, lower is better)
   |                                    | FP32              | INT8               |
   |----                             |----                 |----                  |
   |Validataion dataset   | 68.80             | 69.24              |
   |Test dataset              | 65.72             | 66.14               |
   
   The accuracy results of INT8 is very close to that of FP32.
   
   ### Performance
   #### Profiler Dumps of FP32 End-to-End 
   | Name                       | Total Count | Time (ms) | Min Time (ms) | Max Time (ms) | Avg Time (ms) |
   |---------------------------:|------------:|----------:|---------:|-------------:|--------------:|
   | log_softmax                | 350         | 10968.93  | 31.09         | 31.54         | 31.34         |
   | RNN                        | 1050        | **5664.45**   | 3.13          | 7.37          | 5.39          |
   | _sg_mkldnn_fully_connected | 350         | 2630.26   | 7.40          | 7.78          | 7.52          |
   | _rnn_param_concat          | 1050        | 2392.41   | 0.94          | 3.73          | 2.28          |
   | Reshape                    | 4200        | 775.83    | 0.01          | 0.64          | 0.18          |
   | DeleteVariable             | 3856        | 185.39    | 0.00          | 0.53          | 0.05          |
   | CopyCPU2CPU                | 2450        | 48.89     | 0.01          | 0.05          | 0.02          |
   | Embedding                  | 350         | 21.29     | 0.06          | 0.08          | 0.06          |
   | WaitForVar                 | 2800        | 12.85     | 0.00          | 0.02          | 0.00          |
   | mean                       | 350         | 9.26      | 0.02          | 0.05          | 0.03          |
   | Dropout                    | 1400        | 8.38      | 0.00          | 0.01          | 0.01          |
   | sum                        | 350         | 6.85      | 0.02          | 0.04          | 0.02          |
   | pick                       | 350         | 6.55      | 0.02          | 0.03          | 0.02          |
   | _mul_scalar                | 350         | 3.56      | 0.01          | 0.02          | 0.01          |
   | _zeros                     | 6           | 0.16      | 0.01          | 0.07          | 0.03          |
   | Total                      |             | **22735.04**  |               |               |               |
   
   #### Profiler Dumps of INT8 End-to-End
   | Name                       | Total Count | Time (ms) | Min Time (ms) | Max Time (ms) | Avg Time (ms) |
   |-------------------:|-----------:|-----------:|---------------:|---------------:|---------------:|
   | log_softmax                | 350         | 10805.84  | 30.72         | 35.89         | 30.87         |
   | _contrib_quantized_rnn     | 1050        | **2857.42**   | 1.52          | 3.81          | 2.72          |
   | _rnn_param_concat          | 1050        | 2375.36   | 0.83          | 5.93          | 2.26          |
   | _contrib_quantize_asym     | 1050        | 1580.61   | 0.55          | 4.87          | 1.51          |
   | _sg_mkldnn_fully_connected | 350         | 1559.83   | 4.42          | 4.65          | 4.46          |
   | Reshape                    | 4200        | 762.71    | 0.01          | 0.66          | 0.18          |
   | DeleteVariable             | 3856        | 131.79    | 0.00          | 0.44          | 0.03          |
   | CopyCPU2CPU                | 2450        | 48.68     | 0.01          | 0.06          | 0.02          |
   | Embedding                  | 350         | 21.03     | 0.06          | 0.08          | 0.06          |
   | WaitForVar                 | 2796        | 12.34     | 0.00          | 0.02          | 0.00          |
   | _contrib_quantize_v2       | 350         | 11.29     | 0.03          | 0.06          | 0.03          |
   | mean                       | 350         | 9.17      | 0.02          | 0.15          | 0.03          |
   | Dropout                    | 1400        | 8.31      | 0.00          | 0.01          | 0.01          |
   | sum                        | 350         | 6.63      | 0.02          | 0.04          | 0.02          |
   | pick                       | 350         | 6.22      | 0.02          | 0.03          | 0.02          |
   | _mul_scalar                | 350         | 3.67      | 0.01          | 0.03          | 0.01          |
   | _zeros                     | 6           | 0.11      | 0.01          | 0.07          | 0.02          |
   | Total                      |             | **20201.01**  |               |               |               |
   
   End-to-End latency got ~1.1x speedup (22735.04 vs 20201.01) which is not that good. However, `_contrib_quantized_rnn` got ~2.0x speedup compared with `RNN`. Since `RNN` only occupies ~25% of total time while it's \~48% with `log_softmax`, the speedup of `_contrib_quantized_rnn` might be weakened. And `_contrib_quantize_asym` has a poor performance which needs further optimization (WIP). 
   
   Besides, the quantization flow of LSTM only takes some gemm operations into INT8 calculation. Others, such as gates' additions, bias additions, element-wise activations, are remain as FP32. So the speedup of `_contrib_quantized_rnn` isn't able to reach the expected 3\~4x speedup.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-611344017
 
 
   Hey @zixuanweeei , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [windows-cpu, website, sanity, miscellaneous, unix-gpu, centos-gpu, clang, unix-cpu, edge, centos-cpu, windows-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] zixuanweeei commented on pull request #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on pull request #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-619294863


   @mxnet-bot run ci [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-618275000


   Jenkins CI successfully triggered : [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] zixuanweeei commented on pull request #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on pull request #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-618728000


   @mxnet-bot run ci [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on issue #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-618095929


   Jenkins CI successfully triggered : [unix-cpu, windows-gpu, centos-cpu, sanity, miscellaneous, website, clang, windows-cpu, centos-gpu, unix-gpu, edge]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on issue #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-618095869


   @mxnet-bot run ci [all]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

mxnet-bot commented on pull request #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-619294876


   Jenkins CI successfully triggered : [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

zixuanweeei commented on issue #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-618274921


   @mxnet-bot run ci [windows-gpu]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel closed pull request #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

pengzhao-intel closed pull request #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-mxnet] pengzhao-intel commented on pull request #18001: [MKLDNN] Support quantized rnn

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on pull request #18001:
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-679207792


   Closing since we need to refactor quantization flow in master


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org