You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mxnet.apache.org by "Lv, Tao A" <ta...@intel.com> on 2019/07/20 15:05:46 UTC

[Discuss] Upgrade MKL-DNN submodule to its v1.0 release


Hi dev,



MKL-DNN just published its first major release this month: https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important design issues, in the coming v1.0 major release, some of the data structures, primitive APIs and execution model will be changed and the compatibility to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are mostly covered in RFC for v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. The major changes are listed as below:
*        Support large tensor with int64_t dimension size.
*        Expose scratchpad to support stateless primitive and better memory management hence thread safe.
*        Pass memory and stream to primitive at execution.
*        Rework MKL-DNN memory descriptor.
*        Split LSTM/GRU/RNN into different primitives.
*        Remove MKLML dependency and stop the release of MKLML and iomp packages in MKL-DNN repository.
*        Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and the int64 tensor size for MKL-DNN backend. Besides that, all new features will go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x integration, including operator implementation, registration, NDArray modification and graph partitioning. For better collaboration among the community, we will have a feature branch for the development and validation of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code review and CI and finally get committers approval. The development can be simply divide into 3 parts and all the work will be done before Q3'19 ends. During the development, feature branch will sync to the master branch periodically.
*        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference for CNN models after P1 is done.
*        P2: quantization pass, INT8 operators integration (in src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 inference after P2 is done.
*        P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*        Use feature branch for development - all PRs should pass MXNet CI.
*        Disable MKL-DNN related tests at the beginning of development and recover them incrementally during the development.
*        Intel internal validation: mainly focus on performance and convergence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*        MXNet CI: pass all existing unit tests, nightly tests
*        Accuracy: Pass training convergence and inference accuracy validation
*        Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the master branch. Since we cannot have two MKL-DNN libraries in MXNet simultaneously, the upstream should be done in a single PR. Possibly the PR will be large, so I hope the community can take time to review and comment during development of the feature branch.



We need do our best to make this happen before the 1.6.0 release so we can address the license issue raised in the 1.5.0 vote.



Please let me know what do you think about this plan. If you think something should be fixed or improved in this integration, also let me know.



thanks,

-tao (on behalf of the Intel MXNet team)


Re: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Posted by Marco de Abreu <ma...@gmail.com>.
Great job, well done everyone!!

Lv, Tao A <ta...@intel.com> schrieb am Fr., 1. Nov. 2019, 03:50:

> Hi dev,
>
> The feature branch mkldnn-v1.0 has been merged to master. Really
> appreciate your support for this task.
> Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
> Project: https://github.com/apache/incubator-mxnet/projects/16
> PR: https://github.com/apache/incubator-mxnet/pull/16555
>
> If possible, please downstream projects help to verify the latest master
> branch and feel free to report issues if any.
>
> Thanks,
> -tao
>
> -----Original Message-----
> From: Lv, Tao A <ta...@intel.com>
> Sent: Sunday, July 28, 2019 11:55 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <
> jason.y.ye@intel.com>
> Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
> Update:
>
> I just cut out the feature branch for MKL-DNN 1.0 integration:
> https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
>
> Thanks,
> -tao
>
> -----Original Message-----
> From: Lv, Tao A <ta...@intel.com>
> Sent: Friday, July 26, 2019 10:21 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <
> jason.y.ye@intel.com>
> Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
> Seems we don't have any objection. I will try to cut the feature branch in
> the following days.
>
> Thanks,
> -tao
>
> -----Original Message-----
> From: Lv, Tao A <ta...@intel.com>
> Sent: Saturday, July 20, 2019 11:06 PM
> To: dev@mxnet.incubator.apache.org
> Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <
> jason.y.ye@intel.com>
> Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release
>
>
>
> Hi dev,
>
>
>
> MKL-DNN just published its first major release this month:
> https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to
> start a discussion about upgrading MKL-DNN integration from the current
> v0.20 to v1.0.
>
>
>
> Motivation
>
> To improve the general look-n-feel of the library and solve few important
> design issues, in the coming v1.0 major release, some of the data
> structures, primitive APIs and execution model will be changed and the
> compatibility to v0.x versions will be broken accordingly. Change details
> in MKL-DNN v1.0 are mostly covered in RFC for v1.0<
> https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>.
> The major changes are listed as below:
> *        Support large tensor with int64_t dimension size.
> *        Expose scratchpad to support stateless primitive and better
> memory management hence thread safe.
> *        Pass memory and stream to primitive at execution.
> *        Rework MKL-DNN memory descriptor.
> *        Split LSTM/GRU/RNN into different primitives.
> *        Remove MKLML dependency and stop the release of MKLML and iomp
> packages in MKL-DNN repository.
> *        Support Intel integrated graphics.
>
> With these changes, we can resolve or mitigate several existing issues of
> MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue,
> and the int64 tensor size for MKL-DNN backend. Besides that, all new
> features will go to v1.x and will not be back ported to v0.x. MXNet need
> update the MKL-DNN dependency to v1.0 to better leverage new features and
> performance improvement.
>
>
>
> Development
>
> Basically we will follow the same integration methodology we used for v0.x
> integration, including operator implementation, registration, NDArray
> modification and graph partitioning. For better collaboration among the
> community, we will have a feature branch for the development and validation
> of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass
> the code review and CI and finally get committers approval. The development
> can be simply divide into 3 parts and all the work will be done before
> Q3'19 ends. During the development, feature branch will sync to the master
> branch periodically.
> *        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators
> integration (in src/operator/nn/mkldnn/). We can do FP32 training and
> inference for CNN models after P1 is done.
> *        P2: quantization pass, INT8 operators integration (in
> src/operator/quantization/mkldnn). We can do INT8 quantization and INT8
> inference after P2 is done.
> *        P3: RNN operators integration.
>
> If needed, documents will be revised accordingly during the development.
>
>
>
> Validation:
> *        Use feature branch for development - all PRs should pass MXNet CI.
> *        Disable MKL-DNN related tests at the beginning of development and
> recover them incrementally during the development.
> *        Intel internal validation: mainly focus on performance and
> convergence validation on CPU, with models from MXNet examples, Gluon-CV
> and Gluon-NLP.
>
>
>
> Criteria for development done:
> *        MXNet CI: pass all existing unit tests, nightly tests
> *        Accuracy: Pass training convergence and inference accuracy
> validation
> *        Performance: Achieve similar FP32/INT8 performance as v0.x
> integration
>
>
>
> Upstreaming to master branch:
>
> After development is done, we will start to upstream the feature branch to
> the master branch. Since we cannot have two MKL-DNN libraries in MXNet
> simultaneously, the upstream should be done in a single PR. Possibly the PR
> will be large, so I hope the community can take time to review and comment
> during development of the feature branch.
>
>
>
> We need do our best to make this happen before the 1.6.0 release so we can
> address the license issue raised in the 1.5.0 vote.
>
>
>
> Please let me know what do you think about this plan. If you think
> something should be fixed or improved in this integration, also let me know.
>
>
>
> thanks,
>
> -tao (on behalf of the Intel MXNet team)
>
>

RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Posted by "Lv, Tao A" <ta...@intel.com>.
Hi dev,

The feature branch mkldnn-v1.0 has been merged to master. Really appreciate your support for this task.
Branch: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0
Project: https://github.com/apache/incubator-mxnet/projects/16
PR: https://github.com/apache/incubator-mxnet/pull/16555

If possible, please downstream projects help to verify the latest master branch and feel free to report issues if any.

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Sunday, July 28, 2019 11:55 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Update:

I just cut out the feature branch for MKL-DNN 1.0 integration: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Friday, July 26, 2019 10:21 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Seems we don't have any objection. I will try to cut the feature branch in the following days.

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Saturday, July 20, 2019 11:06 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release



Hi dev,



MKL-DNN just published its first major release this month: https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important design issues, in the coming v1.0 major release, some of the data structures, primitive APIs and execution model will be changed and the compatibility to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are mostly covered in RFC for v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. The major changes are listed as below:
*        Support large tensor with int64_t dimension size.
*        Expose scratchpad to support stateless primitive and better memory management hence thread safe.
*        Pass memory and stream to primitive at execution.
*        Rework MKL-DNN memory descriptor.
*        Split LSTM/GRU/RNN into different primitives.
*        Remove MKLML dependency and stop the release of MKLML and iomp packages in MKL-DNN repository.
*        Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and the int64 tensor size for MKL-DNN backend. Besides that, all new features will go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x integration, including operator implementation, registration, NDArray modification and graph partitioning. For better collaboration among the community, we will have a feature branch for the development and validation of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code review and CI and finally get committers approval. The development can be simply divide into 3 parts and all the work will be done before Q3'19 ends. During the development, feature branch will sync to the master branch periodically.
*        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference for CNN models after P1 is done.
*        P2: quantization pass, INT8 operators integration (in src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 inference after P2 is done.
*        P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*        Use feature branch for development - all PRs should pass MXNet CI.
*        Disable MKL-DNN related tests at the beginning of development and recover them incrementally during the development.
*        Intel internal validation: mainly focus on performance and convergence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*        MXNet CI: pass all existing unit tests, nightly tests
*        Accuracy: Pass training convergence and inference accuracy validation
*        Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the master branch. Since we cannot have two MKL-DNN libraries in MXNet simultaneously, the upstream should be done in a single PR. Possibly the PR will be large, so I hope the community can take time to review and comment during development of the feature branch.



We need do our best to make this happen before the 1.6.0 release so we can address the license issue raised in the 1.5.0 vote.



Please let me know what do you think about this plan. If you think something should be fixed or improved in this integration, also let me know.



thanks,

-tao (on behalf of the Intel MXNet team)


RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Posted by "Lv, Tao A" <ta...@intel.com>.
Update:

I just cut out the feature branch for MKL-DNN 1.0 integration: https://github.com/apache/incubator-mxnet/tree/mkldnn-v1.0

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Friday, July 26, 2019 10:21 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Seems we don't have any objection. I will try to cut the feature branch in the following days.

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Saturday, July 20, 2019 11:06 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release



Hi dev,



MKL-DNN just published its first major release this month: https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important design issues, in the coming v1.0 major release, some of the data structures, primitive APIs and execution model will be changed and the compatibility to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are mostly covered in RFC for v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. The major changes are listed as below:
*        Support large tensor with int64_t dimension size.
*        Expose scratchpad to support stateless primitive and better memory management hence thread safe.
*        Pass memory and stream to primitive at execution.
*        Rework MKL-DNN memory descriptor.
*        Split LSTM/GRU/RNN into different primitives.
*        Remove MKLML dependency and stop the release of MKLML and iomp packages in MKL-DNN repository.
*        Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and the int64 tensor size for MKL-DNN backend. Besides that, all new features will go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x integration, including operator implementation, registration, NDArray modification and graph partitioning. For better collaboration among the community, we will have a feature branch for the development and validation of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code review and CI and finally get committers approval. The development can be simply divide into 3 parts and all the work will be done before Q3'19 ends. During the development, feature branch will sync to the master branch periodically.
*        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference for CNN models after P1 is done.
*        P2: quantization pass, INT8 operators integration (in src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 inference after P2 is done.
*        P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*        Use feature branch for development - all PRs should pass MXNet CI.
*        Disable MKL-DNN related tests at the beginning of development and recover them incrementally during the development.
*        Intel internal validation: mainly focus on performance and convergence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*        MXNet CI: pass all existing unit tests, nightly tests
*        Accuracy: Pass training convergence and inference accuracy validation
*        Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the master branch. Since we cannot have two MKL-DNN libraries in MXNet simultaneously, the upstream should be done in a single PR. Possibly the PR will be large, so I hope the community can take time to review and comment during development of the feature branch.



We need do our best to make this happen before the 1.6.0 release so we can address the license issue raised in the 1.5.0 vote.



Please let me know what do you think about this plan. If you think something should be fixed or improved in this integration, also let me know.



thanks,

-tao (on behalf of the Intel MXNet team)


RE: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release

Posted by "Lv, Tao A" <ta...@intel.com>.
Seems we don't have any objection. I will try to cut the feature branch in the following days.

Thanks,
-tao

-----Original Message-----
From: Lv, Tao A <ta...@intel.com> 
Sent: Saturday, July 20, 2019 11:06 PM
To: dev@mxnet.incubator.apache.org
Cc: Zhao, Patric <pa...@intel.com>; Ye, Jason Y <ja...@intel.com>
Subject: [Discuss] Upgrade MKL-DNN submodule to its v1.0 release



Hi dev,



MKL-DNN just published its first major release this month: https://github.com/intel/mkl-dnn/releases/tag/v1.0. Here I would like to start a discussion about upgrading MKL-DNN integration from the current v0.20 to v1.0.



Motivation

To improve the general look-n-feel of the library and solve few important design issues, in the coming v1.0 major release, some of the data structures, primitive APIs and execution model will be changed and the compatibility to v0.x versions will be broken accordingly. Change details in MKL-DNN v1.0 are mostly covered in RFC for v1.0<https://github.com/intel/mkl-dnn/tree/rfc-api-changes-v1.0/doc/rfc/api-v1.0>. The major changes are listed as below:
*        Support large tensor with int64_t dimension size.
*        Expose scratchpad to support stateless primitive and better memory management hence thread safe.
*        Pass memory and stream to primitive at execution.
*        Rework MKL-DNN memory descriptor.
*        Split LSTM/GRU/RNN into different primitives.
*        Remove MKLML dependency and stop the release of MKLML and iomp packages in MKL-DNN repository.
*        Support Intel integrated graphics.

With these changes, we can resolve or mitigate several existing issues of MXNet, eg. #15576 for thread safe, #15544 for MKLML/iomp5 license issue, and the int64 tensor size for MKL-DNN backend. Besides that, all new features will go to v1.x and will not be back ported to v0.x. MXNet need update the MKL-DNN dependency to v1.0 to better leverage new features and performance improvement.



Development

Basically we will follow the same integration methodology we used for v0.x integration, including operator implementation, registration, NDArray modification and graph partitioning. For better collaboration among the community, we will have a feature branch for the development and validation of MKL-DNN 1.0 integration. All the PRs to the feature branch should pass the code review and CI and finally get committers approval. The development can be simply divide into 3 parts and all the work will be done before Q3'19 ends. During the development, feature branch will sync to the master branch periodically.
*        P1: make/cmake build with MKL-DNN v1.0, all FP32 CNN operators integration (in src/operator/nn/mkldnn/). We can do FP32 training and inference for CNN models after P1 is done.
*        P2: quantization pass, INT8 operators integration (in src/operator/quantization/mkldnn). We can do INT8 quantization and INT8 inference after P2 is done.
*        P3: RNN operators integration.

If needed, documents will be revised accordingly during the development.



Validation:
*        Use feature branch for development - all PRs should pass MXNet CI.
*        Disable MKL-DNN related tests at the beginning of development and recover them incrementally during the development.
*        Intel internal validation: mainly focus on performance and convergence validation on CPU, with models from MXNet examples, Gluon-CV and Gluon-NLP.



Criteria for development done:
*        MXNet CI: pass all existing unit tests, nightly tests
*        Accuracy: Pass training convergence and inference accuracy validation
*        Performance: Achieve similar FP32/INT8 performance as v0.x integration



Upstreaming to master branch:

After development is done, we will start to upstream the feature branch to the master branch. Since we cannot have two MKL-DNN libraries in MXNet simultaneously, the upstream should be done in a single PR. Possibly the PR will be large, so I hope the community can take time to review and comment during development of the feature branch.



We need do our best to make this happen before the 1.6.0 release so we can address the license issue raised in the 1.5.0 vote.



Please let me know what do you think about this plan. If you think something should be fixed or improved in this integration, also let me know.



thanks,

-tao (on behalf of the Intel MXNet team)