You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/05/24 07:27:51 UTC

[GitHub] [tvm-rfcs] MichaelJKlaiber commented on a diff in pull request #60: [RFC] UMA Universal Modular Accelerator Interface

MichaelJKlaiber commented on code in PR #60:
URL: https://github.com/apache/tvm-rfcs/pull/60#discussion_r880152269


##########
rfcs/00xx_UMA_Unified_Modular_Accelerator_Interface.md:
##########
@@ -0,0 +1,390 @@
+# UMA: Universal Modular Accelerator Interface
+
+    Feature Name: Universal Modular Accelerator Interface (UMA)
+    Start Date: 2022 February
+	Authors: 
+	  Paul Palomero Bernardo @paulpb, Christoph Gerum @cgerum - University of Tübingen
+      Michael J. Klaiber @mjklaiber, Ingo Feldner - Bosch Research
+      Philipp van Kempen @philippvk, Rafael Stahl @r.stahl, Daniel Müller-Gritschneder - Technical University of Munich
+	  Johannes Partzsch - TU Dresden
+	  Andrew Stevens - Infineon Technologies
+    RFC PR: TBD
+    GitHub Issue: TBD
+
+## Summary
+
+<img src="https://live.staticflickr.com/98/234261205_63fa6a3412_b.jpg" align="left" width="200px"/>
+
+
+The goal of **UMA (Universal Modular Accelerator Interface)** is to create a unified infrastructure for easily integrating external accelerators into TVM. 
+UMA provides file structures, Python interface classes and an API for accelerator integration. These interfaces and API are accessible from Python and are part of the components *UMA Partitioner*, *UMA Lower* and *UMA Codgen*. 
+The features and proposals of *Target registered compiler flow customization* [TVM-RFC0011] and [TVM-RFC0010] are considered, with the difference that UMA tries to provide a more general interface for integrating new accelerators and one specific implementation of the hooks described in [TVM-RFC0011]. 
+
+
+
+<br clear="left"/>
+
+<sub><sup> Image Source:  https://www.flickr.com/photos/luvi/234261205 under CC BY-NC-ND 2.0</sup></sub>
+
+
+
+##  Goal and Motivation
+
+A number of accelerators have already been integrated into TVM, e.g. VTA, ARM EthosU. 
+These are similar in both the structure of their build flow and the operations that they can offload.
+Nonetheless, due to incremental independent development, the TVM interfaces and processing steps used are quite different with little commonality.  A consistent, unified, infrastructure would simplify accelerator integration making it accessible to smaller, hardware-focused, development teams.
+
+The **goal** of UMA is to establish two API layers with a different target groups of users:
+
+**Porcelain Layer**: UMA
+  - Straight-forward, *Python-only* and stable API wrapper of plumbing layer
+  - Easy and clearly-defined template for integration of accelerators
+  - Short learning period for hardware/software engineers new to TVM
+
+**Plumbing Layer**: 
+  - Collage-like API [COLLAGE-RFC](https://github.com/mbs-octoml/mbs-tvm-rfcs/blob/mbs-rfcs-collage/rfcs/xxxx-collage.md) + other TVM APIs
+  - Powerful API to core-compiler + other TVM features
+  - Target audience is experienced TVM users/developers
+  - C++ and Python APIs
+
+
+## Focus
+
+UMA's primary objective is to enable straight-forward TVM integration of loosely-coupled processor/microcontroller controlled accelerators.  That is, accelerators capable of executing complete tensor operations or operation-graphs without host processor intervention.
+Secondary objectives are:
+
+* Support for closely-coupled accelerators (those offload parts of  CPU computation for significant elements of tensor operations)
+* Compatibility with both run-time or ahead-of-time compilation
+* Support for heterogeneous execution utilizing accelerators optimized for specific operations or data types
+
+Accelerator support or optimization functions **outside** the scope of UMA are:
+
+* Parallel execution on multi-accelerator architectures (to be handled by executor/run-time and customized layer splitting)
+* Real-time execution (to be handled by executor/run-time)
+* High-level support for parameter conversion like quantization or sparsity exploitation (to be realized via model pre-processing or in accelerator backends)
+
+## Guide-level explanation 
+
+
+### Flow description 
+
+
+
+The figure below describes the UMA interface from a top level. An *Accelerator Partitioner* which is a specialization of the *UMA Partitioner* takes the Relay graph and matches for supported and unsupported operators. Unsupported operators are processed with the default TVM flow. Supported operator are processed with **UMA Pipeline**.
+In the following the tasks and the functionality of each block in the figure below is described:
+
+![](uma_toplevel.png)
+
+UMA Partitioner: 
+* Register relay passes
+* Register patterns - supported sub-graph operations
+* Order: pre-partitioning passes, Graph partitioning, post-partitioning passes
+* API level:
+    * UMA Partitioner creates a wrapper API to TVM core-compiler APIs
+
+The figure below described the *UMA Pipeline*. The blocks are described below:
+
+![](uma_pipeline.png)
+
+UMA Pipelining:
+* Consists of UMALower and UMACodgen, which implement the target hook Relay-to-TIR and TIR-to-Runtime (proposed in [TVM-RFC0010])
+* UMALower
+  * Input: Partitioned composite functions
+  * Custom primitives can be registered
+  * Lowering from Relay to S-TIR, using TOPI or custom primitives 
+  * Interface for registering accelerator-specific passes
+  * Execution of UMA passes on S-TIR and NS-TIR
+  * Output:  NS-TIR(including tir.extern calls)
+* UMACodegen
+  * Input: NS-TIR(including tir.extern calls)
+  * Defaults to standard TVM codegen
+  * Intend is to provide a Python interface to insert/emit target code
+  * Output: Target .c files
+
+The intention is to use TensorIR with MetaScheduler for optimization and Relax (a possible succesor of Relay [video link](https://www.youtube.com/watch?v=xVbkjJDMexo)) in later versions.
+
+
+Abbreviations:
+S-TIR: Schedulable TIR
+NS-TIR: Non-Schedulable TIR
+
+### Adding a New Custom Accelerator
+
+A custom accelerator is added by inheriting the `UMABackend`. New elements (e.g., passes) are added using a registration machanism. Below example shows a backend that makes use of all available registration functions.
+```python
+"""UMA backend for the UltraTrail accelerator"""
+
+class UltraTrailBackend(UMABackend):
+    def __init__(self):
+        super(UltraTrailBackend, self).__init__()
+	
+	# Target configuration
+        self._register_target_attr("dimension", default=8)
+
+        # Relay to Relay function registration
+        self._register_pattern("conv1d_relu", conv1d_relu_pattern())
+
+        self._register_relay_pass(PassPhase.POST_PARTITIONING, ConfigGenerator())
+        self._register_relay_pass(PassPhase.POST_PARTITIONING, BufferScopeAnnotator())
+
+        # Relay to TIR function registration
+        self._register_operator_strategy("nn.conv1d", custom_conv1d_strategy)
+
+        self._register_tir_pass(PassPhase.TIR_PHASE_0, CodegenGenerateExternCalls())
+
+        # TIR to runtime function registration
+        self._register_codegen(format="c", includes=gen_includes, replace_call_extern=None)
+
+    @property
+    def target_name(self):
+        return "ultra_trail"
+```
+

Review Comment:
   Hi @areusch, 10 mins to show the changes are fine.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org