You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/19 15:07:30 UTC
[GitHub] [tvm-rfcs] sunggg commented on a diff in pull request #89: [RFC] Relax Upstreaming

sunggg commented on code in PR #89:
URL: https://github.com/apache/tvm-rfcs/pull/89#discussion_r950294616


##########
rfcs/0089-relax-upstreaming.md:
##########
@@ -0,0 +1,701 @@
+- Feature Name: Relax Upstreaming
+- Start Date: 2022-08-17
+- RFC PR: [apache/tvm-rfcs#0089](https://github.com/apache/tvm-rfcs/pull/0089)
+- GitHub Issue: [apache/tvm#0000](https://github.com/apache/tvm/issues/0000)
+- Co-Authors: [@denise-k](https://github.com/denise-k), [@jwfromm](https://github.com/jwfromm)
+
+# 1. **Summary**
+
+This RFC proposes to upstream the core foundation of Relax (Relay Next). Relax is a new graph-level IR that enables new capabilities to address the [critical needs](https://discuss.tvm.apache.org/t/establish-tvm-unity-connection-a-technical-strategy/13344) identified by the TVM community over the years of using and developing deep learning compilers.
+
+# 2. **Motivation and goals**
+
+Relax is an effort within [TVM Unity](https://tvm.apache.org/2021/12/15/tvm-unity) that aims to evolve the graph-level IR to maximize **expressibility, performance, and portability** across today and tomorrow’s workloads. Relax has three key goals motivated by the TVM community’s needs, and lessons the community has learned in ML acceleration through years of using and developing TVM:
+
+- Build a unified interface to transcends the boundaries of TVM’s abstractions between graph-level IR, tensor programs (TensorIR), and runtime libraries (PackedFunc);
+- Enable and optimize dynamic shape workloads;
+- Support “computational graph” style optimizations with advanced dataflow semantics.
+
+For more details on the design goals of Relax, please check out the [discuss forum post](https://discuss.tvm.apache.org/t/relax-co-designing-high-level-abstraction-towards-tvm-unity/12496).
+
+The main focus of this upstreaming RFC is to upstream the **core foundation** of Relax as an **optional** compilation flow in TVM with two principles:
+
+- **Minimize disruption:** This upstreaming should provide a **non-default** path to enable new capabilities for users/developers who are interested in what Relax brings, so it will not break the current default Relay flow.
+- **Minimize complexity:** This upstreaming should reuse existing TVM/Relay infrastructure as much as possible (for example IRModule, runtime Module, TOPI library, etc.) to avoid duplicated effort and code.
+
+This initial upstreaming will open the path for TVM Unity, and incrementally bring Relax into the community.
+
+# 3. **Guide-level explanation**
+
+This section introduces the three major design points of Relax, which map directly to the three key goals of Relax in the last section. At the beginning of this section, we first introduce what user-facing interfaces will look like after this RFC lands.
+
+(Most of the code examples in this RFC are written in [TVMScript](https://github.com/apache/tvm-rfcs/pull/74/files#diff-6965a40ad8df7618ae68e11c88f924542a506c74a931cc3011ae9f99989b5f51R21-R27), which enables users to write and print TVM programs containing both Relax and TIR functions with Python syntax.)
+
+## User-facing interface
+
+After this upstreaming lands, users are able to write a Relax program in TVMScript or translate a model directly from Relay. Relax provides a simple API to compile the IRModule to VM executable, and run it on Relax VM.
+
+```python
+import tvm.script
+from tvm.script import relax as R, tir as T
+
+# Relax IRModule written in TVMScript
+@tvm.script.ir_module
+class MyIRModule:
+    # This is a TIR PrimFunc which calls the TIR intrinsic T.exp
+    @T.prim_func
+    def tir_exp_func(x: T.handle, y: T.handle): ## <= D2
+        X = T.match_buffer(x, (n,), "float32")
+        Y = T.match_buffer(y, (n,), "float32")
+        with T.grid(n) as i:
+            Y[i] = T.exp(X[i])
+
+    # This is a Relax function which contains a dataflow block
+    # representing a computational graph, as well as a call to an
+    # opaque packed function which performs an in-place update to the
+    # data in variable gv0.
+    # We mark the corresponding design points (D0, D1, D2) that map to
+    # the following sections throughout the relax function bellow.
+    @R.function
+    def relax_func(x: R.Tensor[(n, k), "float32"], w: R.Tensor[_, "float32"]):
+    # n, k above are implicitly defined within the function signature
+    # so we will be able to refer to n, k within all of relax_func
+        with R.dataflow(): ## <= D2
+            lv0 = R.match_shape(w, (k, m)) ## <= D1
+            lv1: R.Tensor[(n, m), "float32"] = R.dot(x, lv0)
+            lv2: R.Tensor[(n * m,), "float32"] = R.flatten(lv1) ## <= D1
+            lv3: R.Shape = (n * m,)  ## <= D1
+            gv0 = R.call_tir(tir_exp_func, [lv2], lv3, dtype="float32") ## <= D0
+            R.outputs(gv0)
+
+        R.call_packed("custom_inplace_update", gv0) ## <= D0, D2
+        return gv0
+
+# Print IRModule with syntax highlighting
+MyIRModule.show()
+
+# Build the Relax IRModule
+target = tvm.target.Target("llvm")
+exec = relax.vm.build(MyIRModule, target)
+
+# Dump the VM executable instructions as text
+print(ex.as_text())
+
+# Run the function on Relax VM runtime
+vm = relax.VirtualMachine(exec, tvm.cpu())
+shape = (2, 3)
+data = tvm.nd.array(np.random.rand(*shape).astype(np.float32))
+res = vm["relax_func"](data)
+```
+
+## D0: ****Unified abstractions and optimizations across layers****
+
+The first key design point is to allow the high-level graph IR to be able to directly interact and call into lower-level TensorIR and PackedFunc (TVM FFI).
+
+The TensorIR PrimFunc and many external libraries adopt a **destination-passing-style** (DPS) calling convention that both input and output are passed to the function as arguments, and the outputs are mutated directly inside the function:
+

Review Comment:
   In addition to @slyubomirsky and @Hzfengsy's great points, I will share my thoughts at the perspective of optimization and compilation pipeline. 
   
   Although it might have been possible, the interaction between graph IR and TensorIR/PackedFunc has been quite tricky in Relay world. This has caused significant difficulties and non-trivial engineering efforts, IMO. Here are some representative examples:
   
   - In Relay, there has been no convenient way to optimize graph IR by using the feedback from low-level.
     - If TensorIR performs layout transformation for a primfunc, its decision will affect other primfuncs as well. However, Relay cannot provide such feedback back to graph IR-level since two different IRs cannot co-exist.
     - Graph-level tuning methods (e.g., TASO, Collage) need a capability to apply a set of passes to the part of the graph, compile/measure its performance, and provide the performance number as a feedback back to Graph-IR level to generate better candidates. Although this could be achieved by nontrivial engineering efforts, it would complicate the compilation pipeline and maintenance efforts. IMHO, joint-optimization across multiple graph tuners (e.g., TASO+Collage) would be practically impossible. 
   -  Lowering should be done at once at the boundary between Relay and TensorIR and customizing lowering has been very challenging (e.g., partial/custom lowering).
       - The main pipeline with `OpStrategy` has not been easy to customize and lower part of the graph for your own target, such as BYOC, while keeping other parts still in high-level IR. Therefore, people had to figure out the way to bypass it and apply their own lowering mechanism (e.g., `RelayToTIR`) that bypasses the main pipeline. 
       -  If you only want to apply certain schedule rules on the part of the Graph IR, you would need to lower those parts and apply schedule rules to them. However, such freedom has not been allowed for Relay main pipeline, so people had to find out workaround (e.g., use task extraction and find the primfunc among them. However, if extraction does not behave as users wanted, it would require extra engineering efforts). 
   
   Since Relax unifies abstraction, it can deliver those functionalities as compiler passes while providing flexibility and customizability. For example, since both high-level and low-level IRs co-exist, if TensorIR performs optimization decision that may have global effect, like layout transformation, we can rewrite the graph-level IR accordingly to express such change and consider its global implication. Also, lowering can be implemented as a RelaxIR->TensorIR transformation pass.  If you want to bring your own lowering mechanism, you can write a new pass. I expect you may be able to reuse most of the lowering machinery and only change the part about "how" you want to lower. 
   
   I would be happy to discuss further if you are interested in this direction. :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org