You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/30 06:34:28 UTC

[GitHub] [tvm-rfcs] alter-xp opened a new pull request, #75: [RFC][Backend] RFC-CSI-NN2-Integration

alter-xp opened a new pull request, #75:
URL: https://github.com/apache/tvm-rfcs/pull/75

   Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890104292


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.

Review Comment:
   We are using the [D1 development board ](https://d1.docs.aw-ol.com/en) and QEMU for testing now. You can use QEMU to test. relevant documents can be viewed [here](https://github.com/apache/tvm/commit/f72fdf0a4d13ffc46cec6a04be51929414964dee#diff-a3ac03442468d62f050614f786fe76e513bad54bace0e3978daa19e3a5f6439a).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890061407


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.
+
+# Reference-level explanation
+
+The Relay graph as lowered from the TVM's frontend will be partitioned into subgraphs via running `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph` Relay passes. Our current implementation uses JSON as a level of abstraction between relay operators and CSINN2 functions (or layers). Here is an overview of the flow from compilation to runtime:
+
+- Front-end graph (Currently only NCHW is supported).
+- Lower to relay graph.
+- Run MergeComposite to create a mapping of relay operators to CSINN2 functions.
+- `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph`.
+- Use the codegen stage to convert Relay operators annotated for CSINN2 to JSON.
+- Use CSINNJSONSerializer serialize JSON and constant tensors into `mod.so` .
+
+*CSINN runtime module context*
+
+- Load `mod.so` and deserialize JSON and constant tensors.
+- Create CSINN2 functions from JSON representation and cache.
+- The cached functions are exposed to the graph runtime as packed functions.
+
+Following code block shows the resultant IRModule post partitioning.
+
+```shell
+def @main(%data: Tensor[(1, 3, 24, 24), float32]) -> Tensor[(1, 10, 12, 12), float32] {
+  @tvmgen_default_csinn_main_0(%data) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+
+def @tvmgen_default_csinn_main_0(%csinn_0_i0: Tensor[(1, 3, 24, 24), float32], Inline=1, Compiler="csinn", global_symbol="tvmgen_default_csinn_main_0", Primitive=1) -> Tensor[(1, 10, 12, 12), float32] {
+  %1 = fn (%FunctionVar_0_0: Tensor[(1, 3, 24, 24), float32], PartitionedFromPattern="nn.conv2d_nn.bias_add_", Composite="csinn.conv2d") -> Tensor[(1, 10, 12, 12), float32] {
+    %0 = nn.conv2d(%FunctionVar_0_0, meta[relay.Constant][0] /* ty=Tensor[(10, 3, 3, 3), float32] */, strides=[2, 2], padding=[1, 1, 1, 1]) /* ty=Tensor[(1, 10, 12, 12), float32] */;
+    nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(10), float32] */) /* ty=Tensor[(1, 10, 12, 12), float32] */
+  };
+  %1(%csinn_0_i0) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+```
+
+## Build system
+
+The current implementation has two separate build options in CMake. The reason for this split is because the optimized code for RISC-V cannot be used on an x86 machine.  We can set the flag to decide to generate code running on X86 or RISC-V.
+
+```cmake
+* USE_CSINN=OFF/ON/path-to-CSINN2
+   * OFF - disable CSINN2 support. (default)
+   * ON - add support for compiling CSINN2 codegen.
+   * path-to-CSINN2 - use a specific version of the CSI-NN2 compute library.
+* USE_CSINN_DEVICE_RUNTIME=OFF/X86/C906
+   * OFF - disable CSINN2 runtime support. (default)
+   * X86 - compiling CSINN2 runtime for x86 device.
+   * C906 - cross-compiling CSINN2 runtime for C906 device.
+```
+
+# Testing
+
+Firstly, we will be providing unit tests for the components described above.
+
+Secondly, we are planning to use QEMU in the CI to be able to simulate the result running on C906.

Review Comment:
   Thanks for comments. C906 supports running Linux. But we do need a custom [qemu](https://github.com/T-head-Semi/qemu).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r894250400


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   Thanks for advice, `-mattr` is a good choice. so we don't have to add anything extra. I update it to `llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=generic-rv64 -mabi=lp64d -mattr=+64bit,+m,+a,+f,+d,+c`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890104292


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.

Review Comment:
   We are using the [D1 development board ](https://d1.docs.aw-ol.com/en) and QEMU for testing now. You can use QEMU to test. relevant documents can be viewed [here](https://github.com/apache/tvm/commit/f72fdf0a4d13ffc46cec6a04be51929414964dee#diff-330c6f2d08738b9f7e5880b9fe245798559202d161db73243533525100a7d459).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r892910809


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   ok, cc @kparzysz-quic @csullivan @tqchen @u99127 in case they have thoughts on adding `-march` to `llvm` target. i think my personal inclination is not to do magic under the covers, but my concern here is that it would expand the way we identify CPUs and architectures when deciding whether we can enable various schedules.
   
   either way could you update the RFC text to include this part after the discussion?



##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.
+
+# Reference-level explanation
+
+The Relay graph as lowered from the TVM's frontend will be partitioned into subgraphs via running `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph` Relay passes. Our current implementation uses JSON as a level of abstraction between relay operators and CSINN2 functions (or layers). Here is an overview of the flow from compilation to runtime:
+
+- Front-end graph (Currently only NCHW is supported).
+- Lower to relay graph.
+- Run MergeComposite to create a mapping of relay operators to CSINN2 functions.
+- `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph`.
+- Use the codegen stage to convert Relay operators annotated for CSINN2 to JSON.
+- Use CSINNJSONSerializer serialize JSON and constant tensors into `mod.so` .
+
+*CSINN runtime module context*
+
+- Load `mod.so` and deserialize JSON and constant tensors.
+- Create CSINN2 functions from JSON representation and cache.
+- The cached functions are exposed to the graph runtime as packed functions.
+
+Following code block shows the resultant IRModule post partitioning.
+
+```shell
+def @main(%data: Tensor[(1, 3, 24, 24), float32]) -> Tensor[(1, 10, 12, 12), float32] {
+  @tvmgen_default_csinn_main_0(%data) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+
+def @tvmgen_default_csinn_main_0(%csinn_0_i0: Tensor[(1, 3, 24, 24), float32], Inline=1, Compiler="csinn", global_symbol="tvmgen_default_csinn_main_0", Primitive=1) -> Tensor[(1, 10, 12, 12), float32] {
+  %1 = fn (%FunctionVar_0_0: Tensor[(1, 3, 24, 24), float32], PartitionedFromPattern="nn.conv2d_nn.bias_add_", Composite="csinn.conv2d") -> Tensor[(1, 10, 12, 12), float32] {
+    %0 = nn.conv2d(%FunctionVar_0_0, meta[relay.Constant][0] /* ty=Tensor[(10, 3, 3, 3), float32] */, strides=[2, 2], padding=[1, 1, 1, 1]) /* ty=Tensor[(1, 10, 12, 12), float32] */;
+    nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(10), float32] */) /* ty=Tensor[(1, 10, 12, 12), float32] */
+  };
+  %1(%csinn_0_i0) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+```
+
+## Build system
+
+The current implementation has two separate build options in CMake. The reason for this split is because the optimized code for RISC-V cannot be used on an x86 machine.  We can set the flag to decide to generate code running on X86 or RISC-V.
+
+```cmake
+* USE_CSINN=OFF/ON/path-to-CSINN2
+   * OFF - disable CSINN2 support. (default)
+   * ON - add support for compiling CSINN2 codegen.
+   * path-to-CSINN2 - use a specific version of the CSI-NN2 compute library.
+* USE_CSINN_DEVICE_RUNTIME=OFF/X86/C906
+   * OFF - disable CSINN2 runtime support. (default)
+   * X86 - compiling CSINN2 runtime for x86 device.
+   * C906 - cross-compiling CSINN2 runtime for C906 device.
+```
+
+# Testing
+
+Firstly, we will be providing unit tests for the components described above.
+
+Secondly, we are planning to use QEMU in the CI to be able to simulate the result running on C906.

Review Comment:
   ok, then for CI do you plan to e.g. expand our `ci_qemu` Docker image to additionally contain this custom qemu? (this involves committing a change to `docker/`, then pinging a committer to update the version of the image used)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890162181


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   The CSI-NN2 library supports all RISCV CPUs with vector extensions. when compiling here，llvm only use "rv64gc" instruction set which same as sifive-u74.  Because llvm does not support directly specifying the CPU target as c906 for the time being. We plan to have two options: 1) add a `-march` field and pass it directly to llvm. 2) add the `c906` option (and possibly more options in the future) in the `-mcpu` field, and internally replace it with `march`. Which do you think is more acceptable?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890119231


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.

Review Comment:
   We are using version C RPC here. not Python rpc in docs. I updated this part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch merged pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

areusch merged PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890104292


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.

Review Comment:
   We are using the [D1 development board ](https://d1.docs.aw-ol.com/en) and QEMU for testing now. You can use QEMU to test. relevant documents can be viewed [here](https://github.com/apache/tvm/commit/2a051f39a22110a403ba2e44b5384ee0085a534b#diff-330c6f2d08738b9f7e5880b9fe245798559202d161db73243533525100a7d459).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890060752


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.
+
+# Reference-level explanation
+
+The Relay graph as lowered from the TVM's frontend will be partitioned into subgraphs via running `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph` Relay passes. Our current implementation uses JSON as a level of abstraction between relay operators and CSINN2 functions (or layers). Here is an overview of the flow from compilation to runtime:
+
+- Front-end graph (Currently only NCHW is supported).
+- Lower to relay graph.
+- Run MergeComposite to create a mapping of relay operators to CSINN2 functions.
+- `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph`.
+- Use the codegen stage to convert Relay operators annotated for CSINN2 to JSON.
+- Use CSINNJSONSerializer serialize JSON and constant tensors into `mod.so` .
+
+*CSINN runtime module context*
+
+- Load `mod.so` and deserialize JSON and constant tensors.
+- Create CSINN2 functions from JSON representation and cache.
+- The cached functions are exposed to the graph runtime as packed functions.
+
+Following code block shows the resultant IRModule post partitioning.
+
+```shell
+def @main(%data: Tensor[(1, 3, 24, 24), float32]) -> Tensor[(1, 10, 12, 12), float32] {
+  @tvmgen_default_csinn_main_0(%data) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+
+def @tvmgen_default_csinn_main_0(%csinn_0_i0: Tensor[(1, 3, 24, 24), float32], Inline=1, Compiler="csinn", global_symbol="tvmgen_default_csinn_main_0", Primitive=1) -> Tensor[(1, 10, 12, 12), float32] {
+  %1 = fn (%FunctionVar_0_0: Tensor[(1, 3, 24, 24), float32], PartitionedFromPattern="nn.conv2d_nn.bias_add_", Composite="csinn.conv2d") -> Tensor[(1, 10, 12, 12), float32] {
+    %0 = nn.conv2d(%FunctionVar_0_0, meta[relay.Constant][0] /* ty=Tensor[(10, 3, 3, 3), float32] */, strides=[2, 2], padding=[1, 1, 1, 1]) /* ty=Tensor[(1, 10, 12, 12), float32] */;
+    nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(10), float32] */) /* ty=Tensor[(1, 10, 12, 12), float32] */
+  };
+  %1(%csinn_0_i0) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+```
+
+## Build system
+
+The current implementation has two separate build options in CMake. The reason for this split is because the optimized code for RISC-V cannot be used on an x86 machine.  We can set the flag to decide to generate code running on X86 or RISC-V.
+
+```cmake
+* USE_CSINN=OFF/ON/path-to-CSINN2
+   * OFF - disable CSINN2 support. (default)
+   * ON - add support for compiling CSINN2 codegen.
+   * path-to-CSINN2 - use a specific version of the CSI-NN2 compute library.
+* USE_CSINN_DEVICE_RUNTIME=OFF/X86/C906
+   * OFF - disable CSINN2 runtime support. (default)
+   * X86 - compiling CSINN2 runtime for x86 device.
+   * C906 - cross-compiling CSINN2 runtime for C906 device.
+```
+
+# Testing
+
+Firstly, we will be providing unit tests for the components described above.
+
+Secondly, we are planning to use QEMU in the CI to be able to simulate the result running on C906.

Review Comment:
   thanks for  C906 supports running Linux. But we do need a custom [qemu](https://github.com/T-head-Semi/qemu).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r886026280


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.
+
+# Reference-level explanation
+
+The Relay graph as lowered from the TVM's frontend will be partitioned into subgraphs via running `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph` Relay passes. Our current implementation uses JSON as a level of abstraction between relay operators and CSINN2 functions (or layers). Here is an overview of the flow from compilation to runtime:
+
+- Front-end graph (Currently only NCHW is supported).
+- Lower to relay graph.
+- Run MergeComposite to create a mapping of relay operators to CSINN2 functions.
+- `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph`.
+- Use the codegen stage to convert Relay operators annotated for CSINN2 to JSON.
+- Use CSINNJSONSerializer serialize JSON and constant tensors into `mod.so` .
+
+*CSINN runtime module context*
+
+- Load `mod.so` and deserialize JSON and constant tensors.
+- Create CSINN2 functions from JSON representation and cache.
+- The cached functions are exposed to the graph runtime as packed functions.
+
+Following code block shows the resultant IRModule post partitioning.
+
+```shell
+def @main(%data: Tensor[(1, 3, 24, 24), float32]) -> Tensor[(1, 10, 12, 12), float32] {
+  @tvmgen_default_csinn_main_0(%data) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+
+def @tvmgen_default_csinn_main_0(%csinn_0_i0: Tensor[(1, 3, 24, 24), float32], Inline=1, Compiler="csinn", global_symbol="tvmgen_default_csinn_main_0", Primitive=1) -> Tensor[(1, 10, 12, 12), float32] {
+  %1 = fn (%FunctionVar_0_0: Tensor[(1, 3, 24, 24), float32], PartitionedFromPattern="nn.conv2d_nn.bias_add_", Composite="csinn.conv2d") -> Tensor[(1, 10, 12, 12), float32] {
+    %0 = nn.conv2d(%FunctionVar_0_0, meta[relay.Constant][0] /* ty=Tensor[(10, 3, 3, 3), float32] */, strides=[2, 2], padding=[1, 1, 1, 1]) /* ty=Tensor[(1, 10, 12, 12), float32] */;
+    nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(10), float32] */) /* ty=Tensor[(1, 10, 12, 12), float32] */
+  };
+  %1(%csinn_0_i0) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+```
+
+## Build system
+
+The current implementation has two separate build options in CMake. The reason for this split is because the optimized code for RISC-V cannot be used on an x86 machine.  We can set the flag to decide to generate code running on X86 or RISC-V.
+
+```cmake
+* USE_CSINN=OFF/ON/path-to-CSINN2
+   * OFF - disable CSINN2 support. (default)
+   * ON - add support for compiling CSINN2 codegen.
+   * path-to-CSINN2 - use a specific version of the CSI-NN2 compute library.
+* USE_CSINN_DEVICE_RUNTIME=OFF/X86/C906
+   * OFF - disable CSINN2 runtime support. (default)
+   * X86 - compiling CSINN2 runtime for x86 device.
+   * C906 - cross-compiling CSINN2 runtime for C906 device.
+```
+
+# Testing
+
+Firstly, we will be providing unit tests for the components described above.
+
+Secondly, we are planning to use QEMU in the CI to be able to simulate the result running on C906.

Review Comment:
   what's the target operating system for this deployment environment? do you guys need a custom qemu, or would a stock one work? we currently use qemu to run Zephyr unit tests, but i'm not sure if that's the right OS for this work.



##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.

Review Comment:
   could you guys specify how you expect users to do this? e.g. you could just link to [rpc docs](https://tvm.apache.org/docs/tutorial/cross_compilation_and_rpc.html) if it's the standard flow



##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.

Review Comment:
   are you guys able to say which device you're using to test?



##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   could you guys talk a bit more about any required hardware support, and how you plan to determine if a riscv target supports those extensions based on the Target string?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r890095499


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):
+      factory = tvm.relay.build(csinn_module)
+  
+  # Export the module
+  lib_path = "lib_csinn2.so"
+  cross_compile = 'riscv64-unknown-linux-gnu-g++'
+  lib.export_library(lib_path, cc=cross_compile)
+  ```
+
+- Running RPC service on device.
+
+- Connect the device and run.
+
+# Reference-level explanation
+
+The Relay graph as lowered from the TVM's frontend will be partitioned into subgraphs via running `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph` Relay passes. Our current implementation uses JSON as a level of abstraction between relay operators and CSINN2 functions (or layers). Here is an overview of the flow from compilation to runtime:
+
+- Front-end graph (Currently only NCHW is supported).
+- Lower to relay graph.
+- Run MergeComposite to create a mapping of relay operators to CSINN2 functions.
+- `AnnotateTarget`, `MergeCompilerRegions` and `PartitionGraph`.
+- Use the codegen stage to convert Relay operators annotated for CSINN2 to JSON.
+- Use CSINNJSONSerializer serialize JSON and constant tensors into `mod.so` .
+
+*CSINN runtime module context*
+
+- Load `mod.so` and deserialize JSON and constant tensors.
+- Create CSINN2 functions from JSON representation and cache.
+- The cached functions are exposed to the graph runtime as packed functions.
+
+Following code block shows the resultant IRModule post partitioning.
+
+```shell
+def @main(%data: Tensor[(1, 3, 24, 24), float32]) -> Tensor[(1, 10, 12, 12), float32] {
+  @tvmgen_default_csinn_main_0(%data) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+
+def @tvmgen_default_csinn_main_0(%csinn_0_i0: Tensor[(1, 3, 24, 24), float32], Inline=1, Compiler="csinn", global_symbol="tvmgen_default_csinn_main_0", Primitive=1) -> Tensor[(1, 10, 12, 12), float32] {
+  %1 = fn (%FunctionVar_0_0: Tensor[(1, 3, 24, 24), float32], PartitionedFromPattern="nn.conv2d_nn.bias_add_", Composite="csinn.conv2d") -> Tensor[(1, 10, 12, 12), float32] {
+    %0 = nn.conv2d(%FunctionVar_0_0, meta[relay.Constant][0] /* ty=Tensor[(10, 3, 3, 3), float32] */, strides=[2, 2], padding=[1, 1, 1, 1]) /* ty=Tensor[(1, 10, 12, 12), float32] */;
+    nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(10), float32] */) /* ty=Tensor[(1, 10, 12, 12), float32] */
+  };
+  %1(%csinn_0_i0) /* ty=Tensor[(1, 10, 12, 12), float32] */
+}
+```
+
+## Build system
+
+The current implementation has two separate build options in CMake. The reason for this split is because the optimized code for RISC-V cannot be used on an x86 machine.  We can set the flag to decide to generate code running on X86 or RISC-V.
+
+```cmake
+* USE_CSINN=OFF/ON/path-to-CSINN2
+   * OFF - disable CSINN2 support. (default)
+   * ON - add support for compiling CSINN2 codegen.
+   * path-to-CSINN2 - use a specific version of the CSI-NN2 compute library.
+* USE_CSINN_DEVICE_RUNTIME=OFF/X86/C906
+   * OFF - disable CSINN2 runtime support. (default)
+   * X86 - compiling CSINN2 runtime for x86 device.
+   * C906 - cross-compiling CSINN2 runtime for C906 device.
+```
+
+# Testing
+
+Firstly, we will be providing unit tests for the components described above.
+
+Secondly, we are planning to use QEMU in the CI to be able to simulate the result running on C906.

Review Comment:
   Thanks for comments. C906 supports running Linux. But we do need a custom [qemu](https://github.com/T-head-Semi/qemu). We provide a downloaded script for qemu in CSI-NN2. we plan to [download](https://github.com/apache/tvm/commit/e1f33130e847d6c29b2b4c4e5eba3ca37c89f8cd#diff-a3ac03442468d62f050614f786fe76e513bad54bace0e3978daa19e3a5f6439a) it after clone CSI-NN2 in CI environmnet.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1147389744

   > This is fantastic, thank you! We're excited to hear about the results for MLPerf. My main comment is mostly concerned about documentation. I'm glad the build instructions are included with the RFC, but I'd like to see the inclusion of documentation about how to configure, build, and use the platform be explicitly updated and included.
   
   Thanks for comments.  we provide a documentation in [PR for TVM](https://github.com/apache/tvm/commit/e1f33130e847d6c29b2b4c4e5eba3ca37c89f8cd#diff-330c6f2d08738b9f7e5880b9fe245798559202d161db73243533525100a7d459). But I'm not sure whether these contents need to be completely written in RPC


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] hogepodge commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

hogepodge commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1142328577

   This is fantastic, thank you! We're excited to hear about the results for MLPerf. My main comment is mostly concerned about documentation. I'm glad the build instructions are included with the RFC, but I'd like to see the inclusion of documentation about how to configure, build, and use the platform be explicitly updated and included.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] kparzysz-quic commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

kparzysz-quic commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r892919577


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   You could use `-mattr` for this, for example `-mattr=+m,+a,+f,+d,+c`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] kparzysz-quic commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

kparzysz-quic commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r893861631


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   We should not add `-march` to the `llvm` targe, because it doesn't mean anything.  Compilation target in LLVM is fully specified by triple, CPU, tune flag, and feature string.  There is no extra `arch` in there.  There are `-march` flags in various LLVM utilities, but they correspond to (or alter) the first component of the triple.



##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   We should not add `-march` to the `llvm` target, because it doesn't mean anything.  Compilation target in LLVM is fully specified by triple, CPU, tune flag, and feature string.  There is no extra `arch` in there.  There are `-march` flags in various LLVM utilities, but they correspond to (or alter) the first component of the triple.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1152720214

   @alter-xp we have some:
   - about the CI environment: https://tvm.apache.org/docs/contribute/pull_request.html#ci-environment
   - about building docker containers locally: https://github.com/apache/tvm/blob/main/docker/README.md
   
   we need to write that up :/
   
   the basics are:
   1. submit a PR which only modifies docker/ scripts. the CI will rebuild docker containers and then run through using the newly-rebuilt containers. 
   2. ping a committer to publish the built images to the `tlcpack` dockerhub org
   3. submit another PR which modifies jenkins/Jenkinsfile.j2 (and run generate.py there to modify Jenkinsfile) to reference the newly-pushed containers. you can also add tests which depend on the newly-added dependencies here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] areusch commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

areusch commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1156983012

   thanks @alter-xp, the PR is now merged! please open an RFC tracking issue and list out the PRs we can expect there, then we can proceed on the CI changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r894251754


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   > ok, cc @kparzysz-quic @csullivan @tqchen @u99127 in case they have thoughts on adding `-march` to `llvm` target. i think my personal inclination is not to do magic under the covers, but my concern here is that it would expand the way we identify CPUs and architectures when deciding whether we can enable various schedules.
   > 
   > either way could you update the RFC text to include this part after the discussion?
   
   I updated this part.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1152099008

   > ok, then for CI do you plan to e.g. expand our `ci_qemu` Docker image to additionally contain this custom qemu? (this involves committing a change to `docker/`, then pinging a committer to update the version of the image used)
   
   Thanks for advice, we will add it. but I have no experience about this. Is there any relevant process? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1153408149

   @areusch Thanks a lot. let me update the image.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] kparzysz-quic commented on a diff in pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

kparzysz-quic commented on code in PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#discussion_r892919577


##########
rfcs/0075_RISC-V_CSI-NN2_Intergration.md:
##########
@@ -0,0 +1,171 @@
+- Feature Name: [RFC] RISC-V CSI-NN2 Compute Library integration
+- Start Date: 2022-5-19
+- RFC PR: https://github.com/apache/tvm-rfcs/pull/75
+- GitHub Issue: [https://github.com/apache/tvm/issues/11506](https://github.com/apache/tvm/issues/11506)
+
+# Summary
+
+Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension.
+
+# Motivation
+
+Recently, in the latest Tiny v0.7 list released by AI benchmark MLPerf. Alibaba’s T-Head XuanTie RISC-V C906 processor has achieved first place in all 4 indicators. So, it’s a good time to support RISC-V CPUs with vector extension in TVM.
+
+[CSI-NN2 Compute Library](https://github.com/T-head-Semi/csi-nn2)(CSINN2) is an open-source project that provides hand-crafted assembler routines for RISC-V CPUs with vector extension. It is compatible with RISC-V v0.7.1 and v1.0 vector extension instruction standards. This integration will look at how we can accelerate CPU performance for RISC-V devices like XuanTie C906 in TVM using CSINN2. The idea is that by converting operators from a relay graph to CSINN2 we can achieve faster inference times due to these routines. The initial intention is that this will improve performance for FP32 models. Although, with further improvements to the integration this will extend to quantized models and support for a wider range of operators.
+
+PS: If you are interested in XuanTie C906 processor, [the D1 development board](https://d1.docs.aw-ol.com/en/) is a good choice.
+
+# Guide-level explanation
+
+## Build
+
+- Build with CSI-NN2 support in `build`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_OPENMP gnu)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME X86)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4
+    ```
+
+- Cross-compile CSI-NN2 support in `build-rv`
+  
+  - Set in your config.cmake file
+    
+    ```cmake
+    set(USE_CPP_RPC ON)
+    set(USE_LIBBACKTRACE OFF)
+    set(USE_CSINN /path/to/csi-nn2)
+    set(USE_CSINN_DEVICE_RUNTIME C906)
+    ```
+  
+  - Execute on the command lin
+    
+    ```shell
+    cmake ..;make -j4 runtime tvm_rpc
+    ```
+  
+  After building successfully, we need to copy tvm_rpc and libs which used to device.
+
+## Run
+
+- Export binary library
+  
+  For a relay graph, following python APIs can be used to generate the binary library.
+  
+  ```python
+  from tvm.relay.op.contrib import csinn
+  
+  # API to call CSINN2 partitioning
+  # Here, module is the relay module
+  csinn_module = csinn.partition_for_csinn(module)
+  
+  # Build the Relay graph.
+  with tvm.target.Target("llvm -mtriple=riscv64-unknown-linux-gnu -mcpu=sifive-u74 -mabi=lp64d"):

Review Comment:
   You could use `-mattr` for this, for example `-mattr=+64bit,+m,+a,+f,+d,+c`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm-rfcs] alter-xp commented on pull request #75: [RFC][Backend] RFC-CSI-NN2-Integration

Posted by GitBox <gi...@apache.org>.

alter-xp commented on PR #75:
URL: https://github.com/apache/tvm-rfcs/pull/75#issuecomment-1157154054

   @areusch  [a tracking issue](https://github.com/apache/tvm/issues/11506) is ready. I will keep updating it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org