You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Jun Wu <wu...@gmail.com> on 2018/02/01 04:40:29 UTC

Re: Intel Plan for the contribution to MXNET

Hi Patric,

Thanks for the contribution. It’s great to see actions on developing INT8
inference for CPU! I have a few questions and hope to have your answers.

1. When you said your work is aligned with PR9552
<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you
used quantization+calibration flows developed in that PR for benchmarking
inferences?
2. In you MNIST benchmark, what operators are quantized?
3. Is the MNIST quantized model calibrated?
4. Is the inference accuracy of INT8 produced by the *calibrated* quantized
model, or just quantized model without calibration?
4. What are the throughputs of FP32 model and INT8 model for inference,
respectively?

Thanks,
Jun


On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <pa...@intel.com> wrote:

> Hi MXNET developers,
>
> We are from Intel Software and Service Group (SSG) and working on the
> performance optimization for MXNET on Intel Architecture (IA).
>
> Let me do a simple introduction about our ongoing projects.
>
> Any suggestions and comments are highly appreciated.
>
>
> 1)      MKL-DNN integration with new NNVM interface
>
> We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.
>
> The new implementation shows the better performance and flexibility than
> old MKL engine.
>
>
>
> The PR is under review (https://github.com/apache/
> incubator-mxnet/pull/8302) and very thanks for your great comments in the
> thread :)
>
> After the PR is merged, we will push more MKL-DNN related features and
> performance optimization strategies, such as fused conv + relu OP for the
> inference.
>
>
>
> 2)      INT8 inference
>
> MKL-DNN also provides the int8 calculations such as for conv, relu,
> pooling which can improve the inference performance a lot within very
> slight accuracy drop (like <1%).
>
> Currently, we have implemented quantization, de-quantization, and some
> computing Ops in local branch.
>
> Our latest implementation is aligned with this PR (
> https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit
> test.
>
>
>
> For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we
> got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).
>
> We will update a summary of our solution in this PR soon.
>
>
>
> I hope both CPU and GPU can be compatible and share the common code base
> together. So, I think we need more discussion in the PR :)
>
>
>
> 3)      RNN implementations
>
> Currently, there is no CPU implementation for mx.sym.rnn and the python
> implementation is really slower.
>
> We are working on resolving this issue from two aspects.:
>
> -          Provide the C/C++ level implementation, registering by
> FCompute<cpu> (GPU code should be moved to NNVM as well).
>
> We plan to PR the LSTM/GRU in the March and our initial results as below,
> FYI
>             Size :N = 12, T = 1600, I = 161, H = 1760 (from the first
> layer of deep speech 2)
> Forward
>
> mx.sym.gru binded Intel GRU C(s)
>
> Native mx.rnn.GRUCell(s)
>
> SKX 6148, 2 socket
>
> 1.32
>
> 72.7
>
>
>
>
> -          Provide the MKL-DNN RNN interface (under development,
> https://github.com/intel/mkl-dnn/issues/46), registering by
> FComputeEx<cpu>
>
> The higher performance RNN is under development by MKL-DNN team. And we
> will merge it when it's ready.
>
> I think the CPU user can get further performance boost by MKL-DNN library.
>
>      Thanks in advance!
>
>      BR,
>
>     -- Patric
>
>

Re: Intel Plan for the contribution to MXNET

Posted by Jun Wu <wu...@gmail.com>.

Great. Let's coordinate to keep our efforts aligned.

On Wed, Jan 31, 2018 at 9:51 PM, Zhao, Patric <pa...@intel.com> wrote:

> Thanks, Jun, please see my comments inline.
>
>
>
> Wenting and Jin will follow up the tasks in the PR.
>
>
>
> *From:* Jun Wu [mailto:wujun.nju@gmail.com]
> *Sent:* Thursday, February 1, 2018 12:40 PM
> *To:* dev@mxnet.incubator.apache.org
> *Cc:* Ye, Jason Y <ja...@intel.com>; Lv, Tao A <ta...@intel.com>;
> Jiang, Wenting <we...@intel.com>; Zhao, Patric <
> patric.zhao@intel.com>
> *Subject:* Re: Intel Plan for the contribution to MXNET
>
>
>
> Hi Patric,
>
>
>
> Thanks for the contribution. It’s great to see actions on developing INT8
> inference for CPU! I have a few questions and hope to have your answers.
>
>
>
> 1.      When you said your work is aligned with PR9552
> <https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you
> used quantization+calibration flows developed in that PR for benchmarking
> inferences?
>
> [Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old
> quantization branch.
>
> Now we have merged to master (based on #8302) with
> quantization+calibration PR for int8 development, will show you the
> accuracy and performance soon.
>
>
>
> 2.      In you MNIST benchmark, what operators are quantized?
>
> [Patric] Conv, relu and flatten are quantized in our mnist benchmark
> (conv+relu+flatten+FC+softmax).
>
> Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn)
> int8 ops.
>
>
>
> 3.      Is the MNIST quantized model calibrated?
>
> [Patric] Not yet, we did the experiment on ziheng’s old quantization
> branch, now we are moving to branch of quantization+calibration PR.
>
>
>
> 4.      Is the inference accuracy of INT8 produced by the *calibrated*
> quantized model, or just quantized model without calibration?
>
> [Patric] Without calibration
>
>
>
> 5.      What are the throughputs of FP32 model and INT8 model for
> inference, respectively?
>
> [Patric] In this stage, we are mainly focus on the accuracy and algorithm.
> The performance fine tune is on the way J
>
>
>
> Thanks,
>
> Jun
>
>
>
> On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <pa...@intel.com>
> wrote:
>
> Hi MXNET developers,
>
> We are from Intel Software and Service Group (SSG) and working on the
> performance optimization for MXNET on Intel Architecture (IA).
>
> Let me do a simple introduction about our ongoing projects.
>
> Any suggestions and comments are highly appreciated.
>
>
> 1)      MKL-DNN integration with new NNVM interface
>
> We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.
>
> The new implementation shows the better performance and flexibility than
> old MKL engine.
>
>
>
> The PR is under review (https://github.com/apache/
> incubator-mxnet/pull/8302) and very thanks for your great comments in the
> thread :)
>
> After the PR is merged, we will push more MKL-DNN related features and
> performance optimization strategies, such as fused conv + relu OP for the
> inference.
>
>
>
> 2)      INT8 inference
>
> MKL-DNN also provides the int8 calculations such as for conv, relu,
> pooling which can improve the inference performance a lot within very
> slight accuracy drop (like <1%).
>
> Currently, we have implemented quantization, de-quantization, and some
> computing Ops in local branch.
>
> Our latest implementation is aligned with this PR (
> https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit
> test.
>
>
>
> For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we
> got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).
>
> We will update a summary of our solution in this PR soon.
>
>
>
> I hope both CPU and GPU can be compatible and share the common code base
> together. So, I think we need more discussion in the PR :)
>
>
>
> 3)      RNN implementations
>
> Currently, there is no CPU implementation for mx.sym.rnn and the python
> implementation is really slower.
>
> We are working on resolving this issue from two aspects.:
>
> -          Provide the C/C++ level implementation, registering by
> FCompute<cpu> (GPU code should be moved to NNVM as well).
>
> We plan to PR the LSTM/GRU in the March and our initial results as below,
> FYI
>             Size :N = 12, T = 1600, I = 161, H = 1760 (from the first
> layer of deep speech 2)
> Forward
>
> mx.sym.gru binded Intel GRU C(s)
>
> Native mx.rnn.GRUCell(s)
>
> SKX 6148, 2 socket
>
> 1.32
>
> 72.7
>
>
>
>
> -          Provide the MKL-DNN RNN interface (under development,
> https://github.com/intel/mkl-dnn/issues/46), registering by
> FComputeEx<cpu>
>
> The higher performance RNN is under development by MKL-DNN team. And we
> will merge it when it's ready.
>
> I think the CPU user can get further performance boost by MKL-DNN library.
>
>      Thanks in advance!
>
>      BR,
>
>     -- Patric
>
>
>

RE: Intel Plan for the contribution to MXNET

Posted by "Zhao, Patric" <pa...@intel.com>.

Thanks, Jun, please see my comments inline.

Wenting and Jin will follow up the tasks in the PR.

From: Jun Wu [mailto:wujun.nju@gmail.com]
Sent: Thursday, February 1, 2018 12:40 PM
To: dev@mxnet.incubator.apache.org
Cc: Ye, Jason Y <ja...@intel.com>; Lv, Tao A <ta...@intel.com>; Jiang, Wenting <we...@intel.com>; Zhao, Patric <pa...@intel.com>
Subject: Re: Intel Plan for the contribution to MXNET

Hi Patric,

Thanks for the contribution. It’s great to see actions on developing INT8 inference for CPU! I have a few questions and hope to have your answers.

1.      When you said your work is aligned with PR9552<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you used quantization+calibration flows developed in that PR for benchmarking inferences?

[Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old quantization branch.

Now we have merged to master (based on #8302) with quantization+calibration PR for int8 development, will show you the accuracy and performance soon.

2.      In you MNIST benchmark, what operators are quantized?

[Patric] Conv, relu and flatten are quantized in our mnist benchmark (conv+relu+flatten+FC+softmax).

Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn) int8 ops.

3.      Is the MNIST quantized model calibrated?

[Patric] Not yet, we did the experiment on ziheng’s old quantization branch, now we are moving to branch of quantization+calibration PR.

4.      Is the inference accuracy of INT8 produced by the calibrated quantized model, or just quantized model without calibration?

[Patric] Without calibration

5.      What are the throughputs of FP32 model and INT8 model for inference, respectively?

[Patric] In this stage, we are mainly focus on the accuracy and algorithm. The performance fine tune is on the way ☺

Thanks,
Jun

On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <pa...@intel.com>> wrote:
Hi MXNET developers,

We are from Intel Software and Service Group (SSG) and working on the performance optimization for MXNET on Intel Architecture (IA).

Let me do a simple introduction about our ongoing projects.

Any suggestions and comments are highly appreciated.

1)      MKL-DNN integration with new NNVM interface

We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.

The new implementation shows the better performance and flexibility than old MKL engine.

The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) and very thanks for your great comments in the thread :)

After the PR is merged, we will push more MKL-DNN related features and performance optimization strategies, such as fused conv + relu OP for the inference.

2)      INT8 inference

MKL-DNN also provides the int8 calculations such as for conv, relu,  pooling which can improve the inference performance a lot within very slight accuracy drop (like <1%).

Currently, we have implemented quantization, de-quantization, and some computing Ops in local branch.

Our latest implementation is aligned with this PR (https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit test.

For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).

We will update a summary of our solution in this PR soon.

I hope both CPU and GPU can be compatible and share the common code base together. So, I think we need more discussion in the PR :)

3)      RNN implementations

Currently, there is no CPU implementation for mx.sym.rnn and the python implementation is really slower.

We are working on resolving this issue from two aspects.:

-          Provide the C/C++ level implementation, registering by FCompute<cpu> (GPU code should be moved to NNVM as well).

We plan to PR the LSTM/GRU in the March and our initial results as below, FYI
            Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of deep speech 2)
Forward

mx.sym.gru binded Intel GRU C(s)

Native mx.rnn.GRUCell(s)

SKX 6148, 2 socket

1.32

72.7

-          Provide the MKL-DNN RNN interface (under development, https://github.com/intel/mkl-dnn/issues/46), registering by FComputeEx<cpu>

The higher performance RNN is under development by MKL-DNN team. And we will merge it when it's ready.

I think the CPU user can get further performance boost by MKL-DNN library.

     Thanks in advance!

     BR,

    -- Patric