You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/17 14:37:30 UTC

[GitHub] [tvm-rfcs] crazydemo opened a new pull request, #73: RFC-BYOC-DNNL-Integration

crazydemo opened a new pull request, #73:
URL: https://github.com/apache/tvm-rfcs/pull/73

   RFC for OneDNN Integration via BYOC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch merged pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

areusch merged PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r879666942


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation

Review Comment:
   cool, i agree the CI is not the place for benchmarking. thanks for adding those.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] crazydemo commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

crazydemo commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876522703


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation
+This proposal aims to provide a new approach to integrate oneDNN into TVM via DNNL JSON codegen/runtime by applying the following adjustments to tackle the aforementioned issues: 
+- Register a new “alter_op_layout” function for dnnl to get the optimal layouts for dnnl ops with a new layout auto-query function in Relay.
+- Add a custom pass to rewrite “Conv-Add-Add- ReLu” pattern into “Conv-Add- ReLu” to better handle the pattern comes from BatchNorm Folding (“Conv-bias_add-BN-ReLu”).

Review Comment:
   I update this description. Actually, we utilize `Simplify_Expr` Pass and `FoldConstant` Pass to make `conv-add-add-relu` into `conv-add-relu`. More discussions can be found in [forum](https://discuss.tvm.apache.org/t/rfc-byoc-intel-r-onednn-integration/11582/11).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876338527


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)

Review Comment:
   i think you can just use a relative path here and it should render properly. i don't see these images.



##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation

Review Comment:
   could you discuss your test strategy a bit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876341687


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation

Review Comment:
   to clarify, here i mean: how will we test this in TVM CI (or can we)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] comaniac commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

comaniac commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876367135


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation
+This proposal aims to provide a new approach to integrate oneDNN into TVM via DNNL JSON codegen/runtime by applying the following adjustments to tackle the aforementioned issues: 
+- Register a new “alter_op_layout” function for dnnl to get the optimal layouts for dnnl ops with a new layout auto-query function in Relay.
+- Add a custom pass to rewrite “Conv-Add-Add- ReLu” pattern into “Conv-Add- ReLu” to better handle the pattern comes from BatchNorm Folding (“Conv-bias_add-BN-ReLu”).

Review Comment:
   Can you just leverage MergeComposite to achieve this without introducing a custom pass?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] crazydemo commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

crazydemo commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876521355


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)
+
+![Figure 2 Throughput scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/throughput.png) 
+
+![Figure 3 Real-time scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/real-time.png)
+
+# Reference-level explanation

Review Comment:
   The benchmark results are collected by running on AVX512 machine. And we tested the `latency` (single instance running with 28 cores, batchsize=1), `throughput` (single instance running with 28 cores, batchsize=32) and `real-time` (7 instances with 4core per each and bs=1) scenario. These models are all verified, that is to say, the error between the output tensors (which is of shape (batchsize, 1000)) computed by TVM native codegen, or BYOC-DNNL, is within 1e-5. The benchmark scripts can be found in [Benchmark](https://github.com/crazydemo/TLCBench/tree/cascadelake). 
   
   We don't think it is suitable to make the benchmark mechanism into TVM CI. But we have added test cases in `test_dnnl.py`， all the enabled ops, patterns or functions can be tested by this script.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] crazydemo commented on a diff in pull request #73: RFC-BYOC-DNNL-Integration

Posted by GitBox <gi...@apache.org>.

crazydemo commented on code in PR #73:
URL: https://github.com/apache/tvm-rfcs/pull/73#discussion_r876515483


##########
rfcs/0069-byoc-onednn-integration.md:
##########
@@ -0,0 +1,115 @@
+- Feature Name: oneDNN Integration via BYOC
+- Start Date: 2021-11-29
+- RFC PR: [apache/tvm-rfcs#0069](https://github.com/apache/tvm-rfcs/pull/0069)
+- GitHub PR: [PR#9671](https://github.com/apache/tvm/pull/9671/commits), [PR#9797](https://github.com/apache/tvm/pull/9797/commits), [PR#9995](https://github.com/apache/tvm/pull/9995/commits), [PR#9996](https://github.com/apache/tvm/pull/9996/commits), [PR#10112](https://github.com/apache/tvm/pull/10112/commits), [PR#10266](https://github.com/apache/tvm/pull/10266/commits), [PR#10421](https://github.com/apache/tvm/pull/10421/commits), [PR#10835](https://github.com/apache/tvm/pull/10835/commits), [PR#10836](https://github.com/apache/tvm/pull/10837/commits)
+
+# Summary
+[summary]: #summary
+
+This RFC proposes to integrate DNNL into TVM via BYOC framework. The drawback of the current "Bring DNNL to TVM via DNNL JSON codegen/runtime" is analysed and has been enhanced. Performance benefits are observed by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads.
+
+# Motivation
+[motivation]: #motivation
+
+TVM has shown its good performance on many CV models. One of the major advantages is the maximizing throughput which benefits from the small overhead. However, tuning is needed for each new shape, and it usually takes long time.
+
+oneDNN is an open-source cross-platform performance library of basic building blocks for deep learning applications. The library is optimized for Intel(R) Architecture Processors, Intel(R) Processor Graphics and Xe Architecture graphics. Given a new shape and the env config, oneDNN is able to infer the optimal data format immediately. In order to take the advantage of small overhead of TVM, and achieve the best performance on CPU in a short time, we propose to integrate oneDNN into TVM via BYOC framework. 
+
+Currently, the BYOC homepage provides a simple example of integrating DNNL(naming to oneDNN nowadays) into TVM, but the performance is far away from both TVM autoscheduler and MXNet-oneDNN due to the following main reasons:
+- Non-optimal layout was used in dnnl ops. 
+- Insufficient subgraph partitioning.
+- Unnecessary overhead due to memory copy from tensor to dnnl memory buffer or vice versa.
+
+# Guide-level explanation
+
+We have already solved the above issues and observed the performance benefits by comparing with either MXNet-oneDNN or TVM-autoscheduler on several popular workloads like ResNet50_v1b,  InceptionV3, VGG11_bn in several scenarios including latency (Figure 1, single instance with 28 cores and bs=1), throughput (Figure 2, single instance with 28 core and bs=32) and real-time (Figure 3, 7 instances with 4core per each and bs=1) mode.
+
+## *Note
+[Note]: ##Note
+
+Hardware config
+- Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz
+
+Compilation config
+- g++ 7
+- 'llvm -mcpu=cascadelake -model=platinum-8280'
+- TVM commitID: 19b23b9
+- MXNet version: V1.8.0
+- OneDNN version: V1.7 / V2.4
+
+Runtime config
+- 20 warm-up and 100 batches
+
+![Figure 1 latency scenario](https://github.com/crazydemo/tvm-rfcs/blob/main/rfcs/assets/latest/latency.png)

Review Comment:
   Thanks for your suggestion! The images can be showed now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org