You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by jr...@apache.org on 2021/12/16 03:16:25 UTC

[tvm-site] 01/01: Add TVM Unity blog post

This is an automated email from the ASF dual-hosted git repository.

jroesch pushed a commit to branch tvm-unity
in repository https://gitbox.apache.org/repos/asf/tvm-site.git

commit 6cfb138ab5383ada9e151003822f864cd2cae62e
Author: Jared Roesch <ro...@gmail.com>
AuthorDate: Wed Dec 15 19:15:47 2021 -0800

    Add TVM Unity blog post
---
 Gemfile                        |   2 +
 _posts/2021-12-16-tvm-unity.md | 112 +++++++++++++++++++++++++++++++++++++++++
 images/tvm-unity/image1.png    | Bin 0 -> 333125 bytes
 images/tvm-unity/image2.png    | Bin 0 -> 739514 bytes
 images/tvm-unity/image3.png    | Bin 0 -> 276661 bytes
 images/tvm-unity/image4.png    | Bin 0 -> 200252 bytes
 6 files changed, 114 insertions(+)

diff --git a/Gemfile b/Gemfile
index 85ec323..dea4159 100644
--- a/Gemfile
+++ b/Gemfile
@@ -28,3 +28,5 @@ end
 # Performance-booster for watching directories on Windows
 gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
 
+
+gem "webrick", "~> 1.7"
diff --git a/_posts/2021-12-16-tvm-unity.md b/_posts/2021-12-16-tvm-unity.md
new file mode 100644
index 0000000..97ad0a3
--- /dev/null
+++ b/_posts/2021-12-16-tvm-unity.md
@@ -0,0 +1,112 @@
+---
+ layout: post
+ title: "Apache TVM Unity: a vision for the ML software & hardware ecosystem in 2022"
+ date: 2021-12-15
+ author: Adrian Sampson, Tianqi Chen, Jared Roesch
+---
+
+Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader shift coming in the way that machine learning system stacks optimize for flexibility and agility in the face of a rapidly changing hardware landscape. TVM will evolve to break down the boundaries that constrain the ways current ML systems adapt to rapid changes in ML models and the accelerators that implement them.
+
+## Boundaries in the Modern ML System Stack
+
+![image](/images/tvm-unity/image4.png){: style="width: 40%; margin: auto; display: block;" }
+
+The system stack for modern machine learning consists of four kinds of abstractions:
+1. The *computational graph* abstraction encodes the flow of data between coarse-grained tensor operators. Computational graphs are the high-level abstraction users interact with in [TensorFlow](https://www.tensorflow.org/), [MXNet](https://mxnet.apache.org/), and [PyTorch](https://pytorch.org/).
+2. *Tensor programs* implement the code for the operators in the computational graph. Deep learning compilers generate the low-level C++ or CUDA code for computations like convolutions or matrix multiplications.
+3. Similarly, *libraries and runtimes* include pre-written code to execute and orchestrate tensor operations. BLAS packages and libraries like cuDNN provide extensively tuned operator implementations for specific hardware targets.
+4. *Hardware primitives* are at the bottom of the stack. Here, low-level assembly languages and hardware accelerator interfaces expose the raw capabilities of the machine.
+
+There are *vertical* boundaries between the abstraction levels that prohibit cross-layer interactions and feedback between the levels. There is also a *horizontal* boundary between two opposing ways that software stacks can treat the central tensor computation level. The horizontal boundary divides *library-based* and *compilation-based* approaches to tensor computation.
+
+![image](/images/tvm-unity/image1.png){: style="width: 70%; margin: auto; display: block;" }
+
+Library-based frameworks rely on collections of pre-made, carefully tuned operator implementations as their computational workhorse. Compilation-based frameworks instead generate their own custom tensor operation code from scratch.  Modern software stacks typically use one style or the other, but they don’t combine them: most deep learning frameworks are library-based, while most deep learning compilers cannot incorporate libraries and runtimes.
+
+In the current landscape of ML systems, the boundaries between these layers tend to be strict. Neither approach is better than the other, but they have trade-offs. Library-based stacks excel on standard styles of ML models because they benefit from years of engineering investment common operators. On the other side, the flexibility and automation in compilation-based frameworks can be better for emerging models that require new operators.
+
+Vertical boundaries exist in both styles of software stack. AI applications start at the top of the stack and march through the layers from top to bottom. Frameworks choose data layout and operator fusion strategies at the graph level; then the tensor computations carry out the operators selected in the computational graph; and these operators map onto a fixed set of hardware primitives. It’s a one-shot, unidirectional workflow: performance constraints at the level of tensor programs, fo [...]
+
+Both vertical and horizontal boundaries are slowing down the pace of innovation in machine learning. New hardware accelerators are emerging with new levels of capability and performance, but harnessing them will require fluid collaboration between ML scientists, ML engineers, hardware vendors that these boundaries prevent. To cope with the rapid pace of change in ML systems, frameworks need to support **incremental** evolution: Incorporating new capabilities should require effort proport [...]
+
+## TVM Unity
+
+The TVM Unity vision is about breaking down these barriers. The goal is to enable cross-layer interactions and automate their optimization. It is not to collapse the abstraction layers into a monolith: there is no “silver bullet” representation for AI programs that simultaneously enables optimization at every level. Instead, TVM Unity will build interfaces for the abstractions to interact and exchange information.
+
+Removing the strict barriers between the levels in the system stack will enable new kinds of optimization that work jointly across the layers. A unified view of the entire system will let TVM automatically co-optimize decisions in the computation graph, the tensor operators, and the hardware mapping to search for the best possible implementation of an AI application. At the same time, TVM Unity will also serve as a communication substrate for interactions between ML scientists, ML engine [...]
+
+### Unifying Abstractions
+
+![image](/images/tvm-unity/image2.png){: style="width: 70%; margin: auto; display: block;" }
+
+TVM Unity will focus on letting AI applications fluidly cross the boundaries between operator graphs, tensor programs, and hardware primitives. In TVM, a single Python program can define a core tensor operation, incorporate a custom hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:
+
+```python
+import tvm.script
+from tvm.script import tir as T, relax as R
+
+@tvm.script.ir_module
+class MyIRModule:
+    # Define a TIR based operation.
+	@T.prim_func
+	def tir_mm(X: T.Buffer[(n, d), "float32"],
+                   W: T.Buffer[(d, m), "float32"],
+                   Y: T.Buffer[(n, m), "float32"]):
+        for i, j, k  in T.grid(n, m, d):
+            with T.block("body"):
+                vi, vj, vk = T.axis.remap("SSR", [i, j, k])
+		with T.init():
+            Y[vi, vj] = 0
+        # Can be mapped on to HW intrinsics.
+        Y[vi, vj] += X[vi, vk] * W[vk, wj]
+
+	@R.function
+	def relax_func(x: R.Tensor[(n, d), "float32"], w: R.Tensor[(d, m), "float32"]):
+        with R.dataflow()
+            # Invoke the TIR code.
+            lv0: R.Tensor[(n, m), "float32"] = R.call_dps((n, m), tir_mm, [x, w])
+            lv1: R.Tensor[(n * m,), "float32"] = R.flatten(lv0)
+            gv0: R.Tensor[lv2, "float32"] = R.exp(lv1)
+            R.output(gv0)
+
+        # Invoke external update rule.
+        R.call_packed("custom_inplace_update", gv0)
+        return gv0
+```
+
+This code has both a tensor program (`tir_mm`) and computational graph that includes it (`relax_func`). The high-level data flow can directly invoke the low-level tensor manipulation to build up a larger computation. The TVM runtime unifies the operator graph and compiler-based tensor computation to optimize the entire program. This code also uses `call_packed` to invoke a pre-baked operator—showing how TVM can smoothly integrate library-based operators with the custom computation.
+
+Additionally, TensorIR opens doors to exploit hardware primitives through tensorization. Tensorization transforms loop-level programs to implementations that map onto the primitives that a particular hardware target declares.
+
+The key to highlight here is **cross layer interactions**. Our particular example shows interactions between: (1) computational graph and tensor programs; (2) computational graph and runtime libraries; (3) Finally tensor programs and hardware primitives through on-going automatic tensorization developments in TensorIR. These cross layer interactions open doors for making **incremental optimizations** at the boundary. For example, we can build a customized pass to the lower part of the su [...]
+
+In addition to the unification of abstraction layers, we are also working on unifying the shape representation, to enable **first class symbolic shape support** across the stack. In our particular example, the symbolic shape dimensions(n, m) can flow across the abstractions and enable advanced optimizations for dynamic workloads. The additional capabilities will open doors for both training and inference workload optimizations.
+
+### Unifying Perspectives
+
+Better ML systems require collaboration between ML scientists, ML engineers, and hardware engineers. The coming era of diverse specialized ML hardware will require coordinated effort from teams that include all three groups. By building rich, bidirectional interfaces between the layers in the system stack, TVM Unity aims to be the medium through which this collaboration and iteration happens.
+
+Abstractions in TVM can catalyze the lifecycle of an improvement to an AI application. At the highest level, an ML scientist can specify the operator they need to construct the next generation of a model. ML engineers can work at the tensor computation level to make this new operation efficient. Finally, these tensor computations can rely on hardware primitives written by hardware engineers. The work at each level will interact through Python APIs within the TVM ecosystem. The ability to [...]
+
+### Automation
+
+A unified ML system creates a new, larger search space than a system stack with strict boundaries. Decisions within tensor computations can influence the structure of the operator graph, and new hardware primitives can drastically change the optimal mappings at every other layer.
+
+TVM Unity will expose all these cross-layer interactions for automated optimization. Finding the best implementation for a given application will require learning-driven optimization: using ML to optimize ML by exploring the expanded joint search space and minimize the computational cost.
+
+In addition to that, we also want to leverage domain experts’ help when possible, and create mechanisms to effectively incorporate domain information to help guide the automatic optimizations.
+
+## New Capabilities with Unity
+
+The Unity vision guides the technical roadmap for TVM’s evolution over the next year. The unified approach will position TVM to offer new forms of automation and ecosystem integration that are not possible with today’s system stacks.
+
+With Unity, TVM will unify library-based computation with compiler-based automation. AI applications will be able to combine the world’s best known code for common operators with automatically optimized code for computations that don’t map neatly onto any existing operator. Developers will be able to smoothly transition between both strategies without a steep “performance cliff” when switching from built-in to generated code. Teams will be able to iterate rapidly with compiled code for n [...]
+
+TVM also aims to serve as a bridge to unify the broader ML and hardware ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not constrain teams’ choice of frameworks. TVM models will be easy to embed into other frameworks and runtimes as subgraphs for both training and inference. Through exchange formats like [ONNX](https://onnx.ai/) and [TorchScript](https://pytorch.org/docs/stable/jit.html), TVM models can fluidly integrate into larger applications built on any infr [...]
+
+![image](/images/tvm-unity/image3.png){: style="width: 50%; margin: auto; display: block;" }
+
+Beyond TVM alone, the same forces that are driving TVM Unity exist across the theory and practice of modern ML. Rapid changes to models, emerging alternative hardware, and aging abstraction boundaries all point toward the need for an integrated approach. We expect TVM to lead the way into the next great industry-wide shift in ML systems.
+
+For more details about our vision for TVM, check out [TVMCon 2021](https://www.tvmcon.org) for more talks and discussion.
diff --git a/images/tvm-unity/image1.png b/images/tvm-unity/image1.png
new file mode 100644
index 0000000..616a144
Binary files /dev/null and b/images/tvm-unity/image1.png differ
diff --git a/images/tvm-unity/image2.png b/images/tvm-unity/image2.png
new file mode 100644
index 0000000..a23cd2e
Binary files /dev/null and b/images/tvm-unity/image2.png differ
diff --git a/images/tvm-unity/image3.png b/images/tvm-unity/image3.png
new file mode 100644
index 0000000..4a11da3
Binary files /dev/null and b/images/tvm-unity/image3.png differ
diff --git a/images/tvm-unity/image4.png b/images/tvm-unity/image4.png
new file mode 100644
index 0000000..d8d7657
Binary files /dev/null and b/images/tvm-unity/image4.png differ