You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by jr...@apache.org on 2021/12/16 03:32:22 UTC

[tvm-site] branch asf-site updated: Build at Wed 15 Dec 2021 07:32:18 PM PST

This is an automated email from the ASF dual-hosted git repository.

jroesch pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 2cade43  Build at Wed 15 Dec 2021 07:32:18 PM PST
2cade43 is described below

commit 2cade4330f723b7383b3215cbb231fa21e67c37a
Author: Jared Roesch <ro...@gmail.com>
AuthorDate: Wed Dec 15 19:32:18 2021 -0800

    Build at Wed 15 Dec 2021 07:32:18 PM PST
---
 .gitignore                                    |   3 +
 2020/06/04/tinyml-how-tvm-is-taming-tiny.html |   2 +-
 2020/07/14/bert-pytorch-tvm.html              |   2 +-
 2020/09/26/bring-your-own-datatypes.html      |   6 +-
 2021/12/15/tvm-unity.html                     | 305 ++++++++++++++++++++++
 atom.xml                                      | 352 +++++++++----------------
 blog.html                                     |  10 +
 feed.xml                                      | 280 ++++++++------------
 images/tvm-unity/image1.png                   | Bin 0 -> 333125 bytes
 images/tvm-unity/image2.png                   | Bin 0 -> 739514 bytes
 images/tvm-unity/image3.png                   | Bin 0 -> 276661 bytes
 images/tvm-unity/image4.png                   | Bin 0 -> 200252 bytes
 rss.xml                                       | 354 +++++++++-----------------
 sitemap.txt                                   |   1 +
 14 files changed, 677 insertions(+), 638 deletions(-)

diff --git a/.gitignore b/.gitignore
index cf6401d..72a3cf5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,3 +6,6 @@ website.tgz
 .jekyll-cache
 docs.tgz
 Gemfile.lock
+.bundle/
+vendor/
+
diff --git a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
index ec640c7..633567e 100644
--- a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
+++ b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
@@ -182,7 +182,7 @@ A standard µTVM setup, where the host communicates with the device via JTAG.</p
   <span class="n">graph_mod</span> <span class="o">=</span> <span class="n">graph_runtime</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">micro_mod</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">tvm</span><span class="p">.</span><span class="n">micro_dev</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
   <span class="n">graph_mod</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data_np</span><span class="p">)</span>
   <span class="n">prediction</span> <span class="o">=</span> <span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">graph_mod</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">asnumpy</span><span class="p">())]</span>
-  <span class="k">print</span><span class="p">(</span><span class="s">f'prediction was </span><span class="si">{</span><span class="n">prediction</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
+  <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'prediction was </span><span class="si">{</span><span class="n">prediction</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
 </code></pre></div></div>
 
 <p>Below are the performance results of MicroTVM, compared with <a href="https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0">CMSIS-NN version 5.7.0</a> (commit <code class="language-plaintext highlighter-rouge">a65b7c9a</code>), a hand-optimized library of ML kernels.</p>
diff --git a/2020/07/14/bert-pytorch-tvm.html b/2020/07/14/bert-pytorch-tvm.html
index 387e219..6e0e11a 100644
--- a/2020/07/14/bert-pytorch-tvm.html
+++ b/2020/07/14/bert-pytorch-tvm.html
@@ -359,7 +359,7 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
                 <span class="k">else</span><span class="p">:</span>
                     <span class="n">name</span> <span class="o">=</span> <span class="s">'...'</span>
                 <span class="n">attr_str</span> <span class="o">=</span> <span class="s">''</span>
-            <span class="n">s</span> <span class="o">=</span> <span class="s">f'</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">arg_str</span><span class="si">}{</span><span class="n">attr_str</span><span class="si">}</span><span class="s">)'</span>
+            <span class="n">s</span> <span class="o">=</span> <span class="sa">f</span><span class="s">'</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">arg_str</span><span class="si">}{</span><span class="n">attr_str</span><span class="si">}</span><span class="s">)'</span>
             <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="n">s</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
             <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args_with_edge</span><span class="p">:</span>
                 <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">arg</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
diff --git a/2020/09/26/bring-your-own-datatypes.html b/2020/09/26/bring-your-own-datatypes.html
index 135d0db..c4ce96f 100644
--- a/2020/09/26/bring-your-own-datatypes.html
+++ b/2020/09/26/bring-your-own-datatypes.html
@@ -157,7 +157,7 @@
 <h2 id="introduction">Introduction</h2>
 
 <p>When designing accelerators, an important decision is how one will approximately represent real numbers in hardware.
-This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.<sup id="fnref:ieee" role="doc-noteref"><a href="#fn:ieee" class="footnote">1</a></sup>
+This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.<sup id="fnref:ieee" role="doc-noteref"><a href="#fn:ieee" class="footnote" rel="footnote">1</a></sup>
 Yet,
   when trying to squeeze
   the most out of hardware
@@ -177,7 +177,7 @@ Due to the lax numerical requirements
   this truncation often has no effect
   on model accuracy,
   while instantly cutting the storage cost
-  in half.<sup id="fnref:jouppi2017datacenter" role="doc-noteref"><a href="#fn:jouppi2017datacenter" class="footnote">2</a></sup><sup id="fnref:tensorflowbfloat" role="doc-noteref"><a href="#fn:tensorflowbfloat" class="footnote">3</a></sup></p>
+  in half.<sup id="fnref:jouppi2017datacenter" role="doc-noteref"><a href="#fn:jouppi2017datacenter" class="footnote" rel="footnote">2</a></sup><sup id="fnref:tensorflowbfloat" role="doc-noteref"><a href="#fn:tensorflowbfloat" class="footnote" rel="footnote">3</a></sup></p>
 
 <p>Before researchers begin building hardware for their datatype, however, they first need to determine how their datatype will behave numerically in the workloads they care about.
 This often involves first building a software-emulated version of their datatype
@@ -247,7 +247,7 @@ These steps are akin to
   <em>declaration</em> and <em>implementation</em> of the datatype,
   respectively.</p>
 
-<p>Please note that all referred code in this post are based on TVM repository’s master branch commit <a href="https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07" target="_blank">4cad71d</a>. We will use an example <code class="language-plaintext highlighter-rouge">posit</code> datatype which can be found under <code class="language-plaintext highlighter-rouge">src/target/datatype/posit/posit-wrapper.cc</code> and can be compiled in TVM with the <code c [...]
+<p>Please note that all referred code in this post are based on TVM repository’s master branch commit <a href="https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07" target="_blank">4cad71d</a>. We will use an example <code class="language-plaintext highlighter-rouge">posit</code> datatype which can be found under <code class="language-plaintext highlighter-rouge">src/target/datatype/posit/posit-wrapper.cc</code> and can be compiled in TVM with the <code c [...]
 
 <h3 id="datatype-registration">Datatype Registration</h3>
 
diff --git a/2021/12/15/tvm-unity.html b/2021/12/15/tvm-unity.html
new file mode 100644
index 0000000..84b0506
--- /dev/null
+++ b/2021/12/15/tvm-unity.html
@@ -0,0 +1,305 @@
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Apache TVM Unity: a vision for the ML software & hardware ecosystem in 2022</title>
+    <link rel="shortcut icon" href="/assets/images/favicon.ico">
+    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
+    <link rel="stylesheet" href="/css/slick.css">
+    <link rel="stylesheet" href="/css/slick-theme.css">
+    <link rel="stylesheet" href="/css/custom.css">
+</head>
+<body>
+
+    
+<div class="bannerPage">
+      <header class="header">
+      <div class="container">
+        <div class="headerInner d-flex justify-content-between align-items-center">
+          <div class="headerLogo">
+            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
+          </div>
+          <div id="headMenu" class="headerNav">
+            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
+                alt="Close"></button>
+                <ul class="nav">
+    
+    <li class="nav-item">
+        <a class="nav-link" href="/community">Community</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="/download">Download</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="/vta">VTA</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="/blog">Blog</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
+    </li>
+    
+    <li class="nav-item">
+        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
+    </li>
+    
+</ul>
+            <div class="responsiveasfdropdown">
+              <button type="button" class="btn-link">
+                ASF
+              </button>
+              <ul>
+    
+    <li>
+        <a href="https://www.apache.org/">Apache Homepage</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/licenses/">License</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/security/">Security</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/events/current-event">Events</a>
+    </li>
+    
+</ul>
+            </div>
+          </div>
+          <div class="responsiveMenuIcon">
+            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
+                alt="Menu Icon" /></button>
+          </div>
+          <div class="asfDropdown">
+            <div class="dropdown">
+              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
+                aria-expanded="false">
+                ASF
+              </button>
+              <div class="dropdown-menu dropdown-menu-right">
+                <ul>
+    
+    <li>
+        <a href="https://www.apache.org/">Apache Homepage</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/licenses/">License</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/security/">Security</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
+    </li>
+    
+    <li>
+        <a href="https://www.apache.org/events/current-event">Events</a>
+    </li>
+    
+</ul>
+              </div>
+            </div>
+          </div>
+        </div>
+      </div>
+    </header>
+
+</div>
+
+
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14 w-100">
+      <h1>Apache TVM Unity: a vision for the ML software & hardware ecosystem in 2022 </h1>
+      <p class="post-meta">
+        <time datetime="2021-12-15T00:00:00-08:00" itemprop="datePublished">
+          Dec 15, 2021
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Adrian Sampson, Tianqi Chen, Jared Roesch</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader shift coming in the way that machine learning system stacks optimize for flexibility and agility in the face of a rapidly changing hardware landscape. TVM will evolve to break down the boundaries that constrain the ways current ML systems adapt to rapid changes in ML models and the accelerators that implement them.</p>
+
+<h2 id="boundaries-in-the-modern-ml-system-stack">Boundaries in the Modern ML System Stack</h2>
+
+<p><img src="/images/tvm-unity/image4.png" alt="image" style="width: 40%; margin: auto; display: block;" /></p>
+
+<p>The system stack for modern machine learning consists of four kinds of abstractions:</p>
+<ol>
+  <li>The <em>computational graph</em> abstraction encodes the flow of data between coarse-grained tensor operators. Computational graphs are the high-level abstraction users interact with in <a href="https://www.tensorflow.org/">TensorFlow</a>, <a href="https://mxnet.apache.org/">MXNet</a>, and <a href="https://pytorch.org/">PyTorch</a>.</li>
+  <li><em>Tensor programs</em> implement the code for the operators in the computational graph. Deep learning compilers generate the low-level C++ or CUDA code for computations like convolutions or matrix multiplications.</li>
+  <li>Similarly, <em>libraries and runtimes</em> include pre-written code to execute and orchestrate tensor operations. BLAS packages and libraries like cuDNN provide extensively tuned operator implementations for specific hardware targets.</li>
+  <li><em>Hardware primitives</em> are at the bottom of the stack. Here, low-level assembly languages and hardware accelerator interfaces expose the raw capabilities of the machine.</li>
+</ol>
+
+<p>There are <em>vertical</em> boundaries between the abstraction levels that prohibit cross-layer interactions and feedback between the levels. There is also a <em>horizontal</em> boundary between two opposing ways that software stacks can treat the central tensor computation level. The horizontal boundary divides <em>library-based</em> and <em>compilation-based</em> approaches to tensor computation.</p>
+
+<p><img src="/images/tvm-unity/image1.png" alt="image" style="width: 70%; margin: auto; display: block;" /></p>
+
+<p>Library-based frameworks rely on collections of pre-made, carefully tuned operator implementations as their computational workhorse. Compilation-based frameworks instead generate their own custom tensor operation code from scratch.  Modern software stacks typically use one style or the other, but they don’t combine them: most deep learning frameworks are library-based, while most deep learning compilers cannot incorporate libraries and runtimes.</p>
+
+<p>In the current landscape of ML systems, the boundaries between these layers tend to be strict. Neither approach is better than the other, but they have trade-offs. Library-based stacks excel on standard styles of ML models because they benefit from years of engineering investment common operators. On the other side, the flexibility and automation in compilation-based frameworks can be better for emerging models that require new operators.</p>
+
+<p>Vertical boundaries exist in both styles of software stack. AI applications start at the top of the stack and march through the layers from top to bottom. Frameworks choose data layout and operator fusion strategies at the graph level; then the tensor computations carry out the operators selected in the computational graph; and these operators map onto a fixed set of hardware primitives. It’s a one-shot, unidirectional workflow: performance constraints at the level of tensor programs, [...]
+
+<p>Both vertical and horizontal boundaries are slowing down the pace of innovation in machine learning. New hardware accelerators are emerging with new levels of capability and performance, but harnessing them will require fluid collaboration between ML scientists, ML engineers, hardware vendors that these boundaries prevent. To cope with the rapid pace of change in ML systems, frameworks need to support <strong>incremental</strong> evolution: Incorporating new capabilities should requir [...]
+
+<h2 id="tvm-unity">TVM Unity</h2>
+
+<p>The TVM Unity vision is about breaking down these barriers. The goal is to enable cross-layer interactions and automate their optimization. It is not to collapse the abstraction layers into a monolith: there is no “silver bullet” representation for AI programs that simultaneously enables optimization at every level. Instead, TVM Unity will build interfaces for the abstractions to interact and exchange information.</p>
+
+<p>Removing the strict barriers between the levels in the system stack will enable new kinds of optimization that work jointly across the layers. A unified view of the entire system will let TVM automatically co-optimize decisions in the computation graph, the tensor operators, and the hardware mapping to search for the best possible implementation of an AI application. At the same time, TVM Unity will also serve as a communication substrate for interactions between ML scientists, ML eng [...]
+
+<h3 id="unifying-abstractions">Unifying Abstractions</h3>
+
+<p><img src="/images/tvm-unity/image2.png" alt="image" style="width: 70%; margin: auto; display: block;" /></p>
+
+<p>TVM Unity will focus on letting AI applications fluidly cross the boundaries between operator graphs, tensor programs, and hardware primitives. In TVM, a single Python program can define a core tensor operation, incorporate a custom hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:</p>
+
+<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tvm.script</span>
+<span class="kn">from</span> <span class="nn">tvm.script</span> <span class="kn">import</span> <span class="n">tir</span> <span class="k">as</span> <span class="n">T</span><span class="p">,</span> <span class="n">relax</span> <span class="k">as</span> <span class="n">R</span>
+
+<span class="o">@</span><span class="n">tvm</span><span class="p">.</span><span class="n">script</span><span class="p">.</span><span class="n">ir_module</span>
+<span class="k">class</span> <span class="nc">MyIRModule</span><span class="p">:</span>
+    <span class="c1"># Define a TIR based operation.
+</span>	<span class="o">@</span><span class="n">T</span><span class="p">.</span><span class="n">prim_func</span>
+	<span class="k">def</span> <span class="nf">tir_mm</span><span class="p">(</span><span class="n">X</span><span class="p">:</span> <span class="n">T</span><span class="p">.</span><span class="n">Buffer</span><span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">d</span><span class="p">),</span> <span class="s">"float32"</span><span class="p">],</span>
+                   <span class="n">W</span><span class="p">:</span> <span class="n">T</span><span class="p">.</span><span class="n">Buffer</span><span class="p">[(</span><span class="n">d</span><span class="p">,</span> <span class="n">m</span><span class="p">),</span> <span class="s">"float32"</span><span class="p">],</span>
+                   <span class="n">Y</span><span class="p">:</span> <span class="n">T</span><span class="p">.</span><span class="n">Buffer</span><span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span><span class="p">),</span> <span class="s">"float32"</span><span class="p">]):</span>
+        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">k</span>  <span class="ow">in</span> <span class="n">T</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">d</span><span class="p">):</span>
+            <span class="k">with</span> <span class="n">T</span><span class="p">.</span><span class="n">block</span><span class="p">(</span><span class="s">"body"</span><span class="p">):</span>
+                <span class="n">vi</span><span class="p">,</span> <span class="n">vj</span><span class="p">,</span> <span class="n">vk</span> <span class="o">=</span> <span class="n">T</span><span class="p">.</span><span class="n">axis</span><span class="p">.</span><span class="n">remap</span><span class="p">(</span><span class="s">"SSR"</span><span class="p">,</span> <span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> [...]
+		<span class="k">with</span> <span class="n">T</span><span class="p">.</span><span class="n">init</span><span class="p">():</span>
+            <span class="n">Y</span><span class="p">[</span><span class="n">vi</span><span class="p">,</span> <span class="n">vj</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
+        <span class="c1"># Can be mapped on to HW intrinsics.
+</span>        <span class="n">Y</span><span class="p">[</span><span class="n">vi</span><span class="p">,</span> <span class="n">vj</span><span class="p">]</span> <span class="o">+=</span> <span class="n">X</span><span class="p">[</span><span class="n">vi</span><span class="p">,</span> <span class="n">vk</span><span class="p">]</span> <span class="o">*</span> <span class="n">W</span><span class="p">[</span><span class="n">vk</span><span class="p">,</span> <span class="n">wj</span><span c [...]
+
+	<span class="o">@</span><span class="n">R</span><span class="p">.</span><span class="n">function</span>
+	<span class="k">def</span> <span class="nf">relax_func</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="n">R</span><span class="p">.</span><span class="n">Tensor</span><span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">d</span><span class="p">),</span> <span class="s">"float32"</span><span class="p">],</span> <span class="n">w</span><span class="p">:</span> <span class="n">R</span><span class="p">.</span>< [...]
+        <span class="k">with</span> <span class="n">R</span><span class="p">.</span><span class="n">dataflow</span><span class="p">()</span>
+            <span class="c1"># Invoke the TIR code.
+</span>            <span class="n">lv0</span><span class="p">:</span> <span class="n">R</span><span class="p">.</span><span class="n">Tensor</span><span class="p">[(</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span><span class="p">),</span> <span class="s">"float32"</span><span class="p">]</span> <span class="o">=</span> <span class="n">R</span><span class="p">.</span><span class="n">call_dps</span><span class="p">((</span><span class="n">n</span><span class [...]
+            <span class="n">lv1</span><span class="p">:</span> <span class="n">R</span><span class="p">.</span><span class="n">Tensor</span><span class="p">[(</span><span class="n">n</span> <span class="o">*</span> <span class="n">m</span><span class="p">,),</span> <span class="s">"float32"</span><span class="p">]</span> <span class="o">=</span> <span class="n">R</span><span class="p">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">lv0</span><span class="p"> [...]
+            <span class="n">gv0</span><span class="p">:</span> <span class="n">R</span><span class="p">.</span><span class="n">Tensor</span><span class="p">[</span><span class="n">lv2</span><span class="p">,</span> <span class="s">"float32"</span><span class="p">]</span> <span class="o">=</span> <span class="n">R</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">lv1</span><span class="p">)</span>
+            <span class="n">R</span><span class="p">.</span><span class="n">output</span><span class="p">(</span><span class="n">gv0</span><span class="p">)</span>
+
+        <span class="c1"># Invoke external update rule.
+</span>        <span class="n">R</span><span class="p">.</span><span class="n">call_packed</span><span class="p">(</span><span class="s">"custom_inplace_update"</span><span class="p">,</span> <span class="n">gv0</span><span class="p">)</span>
+        <span class="k">return</span> <span class="n">gv0</span>
+</code></pre></div></div>
+
+<p>This code has both a tensor program (<code class="language-plaintext highlighter-rouge">tir_mm</code>) and computational graph that includes it (<code class="language-plaintext highlighter-rouge">relax_func</code>). The high-level data flow can directly invoke the low-level tensor manipulation to build up a larger computation. The TVM runtime unifies the operator graph and compiler-based tensor computation to optimize the entire program. This code also uses <code class="language-plain [...]
+
+<p>Additionally, TensorIR opens doors to exploit hardware primitives through tensorization. Tensorization transforms loop-level programs to implementations that map onto the primitives that a particular hardware target declares.</p>
+
+<p>The key to highlight here is <strong>cross layer interactions</strong>. Our particular example shows interactions between: (1) computational graph and tensor programs; (2) computational graph and runtime libraries; (3) Finally tensor programs and hardware primitives through on-going automatic tensorization developments in TensorIR. These cross layer interactions open doors for making <strong>incremental optimizations</strong> at the boundary. For example, we can build a customized pas [...]
+
+<p>In addition to the unification of abstraction layers, we are also working on unifying the shape representation, to enable <strong>first class symbolic shape support</strong> across the stack. In our particular example, the symbolic shape dimensions(n, m) can flow across the abstractions and enable advanced optimizations for dynamic workloads. The additional capabilities will open doors for both training and inference workload optimizations.</p>
+
+<h3 id="unifying-perspectives">Unifying Perspectives</h3>
+
+<p>Better ML systems require collaboration between ML scientists, ML engineers, and hardware engineers. The coming era of diverse specialized ML hardware will require coordinated effort from teams that include all three groups. By building rich, bidirectional interfaces between the layers in the system stack, TVM Unity aims to be the medium through which this collaboration and iteration happens.</p>
+
+<p>Abstractions in TVM can catalyze the lifecycle of an improvement to an AI application. At the highest level, an ML scientist can specify the operator they need to construct the next generation of a model. ML engineers can work at the tensor computation level to make this new operation efficient. Finally, these tensor computations can rely on hardware primitives written by hardware engineers. The work at each level will interact through Python APIs within the TVM ecosystem. The ability [...]
+
+<h3 id="automation">Automation</h3>
+
+<p>A unified ML system creates a new, larger search space than a system stack with strict boundaries. Decisions within tensor computations can influence the structure of the operator graph, and new hardware primitives can drastically change the optimal mappings at every other layer.</p>
+
+<p>TVM Unity will expose all these cross-layer interactions for automated optimization. Finding the best implementation for a given application will require learning-driven optimization: using ML to optimize ML by exploring the expanded joint search space and minimize the computational cost.</p>
+
+<p>In addition to that, we also want to leverage domain experts’ help when possible, and create mechanisms to effectively incorporate domain information to help guide the automatic optimizations.</p>
+
+<h2 id="new-capabilities-with-unity">New Capabilities with Unity</h2>
+
+<p>The Unity vision guides the technical roadmap for TVM’s evolution over the next year. The unified approach will position TVM to offer new forms of automation and ecosystem integration that are not possible with today’s system stacks.</p>
+
+<p>With Unity, TVM will unify library-based computation with compiler-based automation. AI applications will be able to combine the world’s best known code for common operators with automatically optimized code for computations that don’t map neatly onto any existing operator. Developers will be able to smoothly transition between both strategies without a steep “performance cliff” when switching from built-in to generated code. Teams will be able to iterate rapidly with compiled code fo [...]
+
+<p>TVM also aims to serve as a bridge to unify the broader ML and hardware ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not constrain teams’ choice of frameworks. TVM models will be easy to embed into other frameworks and runtimes as subgraphs for both training and inference. Through exchange formats like <a href="https://onnx.ai/">ONNX</a> and <a href="https://pytorch.org/docs/stable/jit.html">TorchScript</a>, TVM models can fluidly integrate into larger appli [...]
+
+<p><img src="/images/tvm-unity/image3.png" alt="image" style="width: 50%; margin: auto; display: block;" /></p>
+
+<p>Beyond TVM alone, the same forces that are driving TVM Unity exist across the theory and practice of modern ML. Rapid changes to models, emerging alternative hardware, and aging abstraction boundaries all point toward the need for an integrated approach. We expect TVM to lead the way into the next great industry-wide shift in ML systems.</p>
+
+<p>For more details about our vision for TVM, check out <a href="https://www.tvmcon.org">TVMCon 2021</a> for more talks and discussion.</p>
+
+    </div>
+  </div>
+</div>
+</div>
+
+    
+
+
+
+
+  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
+  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
+  <!-- <script src="./assets/js/slick.js"></script> -->
+  <script src="/assets/js/custome.js"></script>
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+</body>
+<section class="footerSec">
+  <div class="footerHeader">
+    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+      <li class="logo">
+
+        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
+      </li>
+      <li class="copywrite d-flex align-items-center">
+        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
+      </li>
+    </ul>
+
+  </div>
+
+  <ul class="container">
+    <li class="footernote">
+      Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
+  </ul>
+
+</section>
+</html>
diff --git a/atom.xml b/atom.xml
index f6d106c..dc9a83e 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
  <title>TVM</title>
  <link href="https://tvm.apache.org" rel="self"/>
  <link href="https://tvm.apache.org"/>
- <updated>2021-11-24T16:21:44-08:00</updated>
+ <updated>2021-12-15T19:32:17-08:00</updated>
  <id>https://tvm.apache.org</id>
  <author>
    <name></name>
@@ -13,6 +13,120 @@
 
  
  <entry>
+   <title>Apache TVM Unity: a vision for the ML software &amp; hardware ecosystem in 2022</title>
+   <link href="https://tvm.apache.org/2021/12/15/tvm-unity"/>
+   <updated>2021-12-15T00:00:00-08:00</updated>
+   <id>https://tvm.apache.org/2021/12/15/tvm-unity</id>
+   <content type="html">&lt;p&gt;Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader shift coming in the way that machine learning system stacks optimize for flexibility and agility in the face of a rapidly changing hardware landscape. TVM will evolve to break down the boundaries that constrain the ways current ML systems adapt to rapid changes in ML models and the accelerators that implement them.&lt;/p&gt;
+
+&lt;h2 id=&quot;boundaries-in-the-modern-ml-system-stack&quot;&gt;Boundaries in the Modern ML System Stack&lt;/h2&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image4.png&quot; alt=&quot;image&quot; style=&quot;width: 40%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;The system stack for modern machine learning consists of four kinds of abstractions:&lt;/p&gt;
+&lt;ol&gt;
+  &lt;li&gt;The &lt;em&gt;computational graph&lt;/em&gt; abstraction encodes the flow of data between coarse-grained tensor operators. Computational graphs are the high-level abstraction users interact with in &lt;a href=&quot;https://www.tensorflow.org/&quot;&gt;TensorFlow&lt;/a&gt;, &lt;a href=&quot;https://mxnet.apache.org/&quot;&gt;MXNet&lt;/a&gt;, and &lt;a href=&quot;https://pytorch.org/&quot;&gt;PyTorch&lt;/a&gt;.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Tensor programs&lt;/em&gt; implement the code for the operators in the computational graph. Deep learning compilers generate the low-level C++ or CUDA code for computations like convolutions or matrix multiplications.&lt;/li&gt;
+  &lt;li&gt;Similarly, &lt;em&gt;libraries and runtimes&lt;/em&gt; include pre-written code to execute and orchestrate tensor operations. BLAS packages and libraries like cuDNN provide extensively tuned operator implementations for specific hardware targets.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Hardware primitives&lt;/em&gt; are at the bottom of the stack. Here, low-level assembly languages and hardware accelerator interfaces expose the raw capabilities of the machine.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;There are &lt;em&gt;vertical&lt;/em&gt; boundaries between the abstraction levels that prohibit cross-layer interactions and feedback between the levels. There is also a &lt;em&gt;horizontal&lt;/em&gt; boundary between two opposing ways that software stacks can treat the central tensor computation level. The horizontal boundary divides &lt;em&gt;library-based&lt;/em&gt; and &lt;em&gt;compilation-based&lt;/em&gt; approaches to tensor computation.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image1.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Library-based frameworks rely on collections of pre-made, carefully tuned operator implementations as their computational workhorse. Compilation-based frameworks instead generate their own custom tensor operation code from scratch.  Modern software stacks typically use one style or the other, but they don’t combine them: most deep learning frameworks are library-based, while most deep learning compilers cannot incorporate libraries and runtimes.&lt;/p&gt;
+
+&lt;p&gt;In the current landscape of ML systems, the boundaries between these layers tend to be strict. Neither approach is better than the other, but they have trade-offs. Library-based stacks excel on standard styles of ML models because they benefit from years of engineering investment common operators. On the other side, the flexibility and automation in compilation-based frameworks can be better for emerging models that require new operators.&lt;/p&gt;
+
+&lt;p&gt;Vertical boundaries exist in both styles of software stack. AI applications start at the top of the stack and march through the layers from top to bottom. Frameworks choose data layout and operator fusion strategies at the graph level; then the tensor computations carry out the operators selected in the computational graph; and these operators map onto a fixed set of hardware primitives. It’s a one-shot, unidirectional workflow: performance constraints at the level of tensor pro [...]
+
+&lt;p&gt;Both vertical and horizontal boundaries are slowing down the pace of innovation in machine learning. New hardware accelerators are emerging with new levels of capability and performance, but harnessing them will require fluid collaboration between ML scientists, ML engineers, hardware vendors that these boundaries prevent. To cope with the rapid pace of change in ML systems, frameworks need to support &lt;strong&gt;incremental&lt;/strong&gt; evolution: Incorporating new capabili [...]
+
+&lt;h2 id=&quot;tvm-unity&quot;&gt;TVM Unity&lt;/h2&gt;
+
+&lt;p&gt;The TVM Unity vision is about breaking down these barriers. The goal is to enable cross-layer interactions and automate their optimization. It is not to collapse the abstraction layers into a monolith: there is no “silver bullet” representation for AI programs that simultaneously enables optimization at every level. Instead, TVM Unity will build interfaces for the abstractions to interact and exchange information.&lt;/p&gt;
+
+&lt;p&gt;Removing the strict barriers between the levels in the system stack will enable new kinds of optimization that work jointly across the layers. A unified view of the entire system will let TVM automatically co-optimize decisions in the computation graph, the tensor operators, and the hardware mapping to search for the best possible implementation of an AI application. At the same time, TVM Unity will also serve as a communication substrate for interactions between ML scientists,  [...]
+
+&lt;h3 id=&quot;unifying-abstractions&quot;&gt;Unifying Abstractions&lt;/h3&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image2.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will focus on letting AI applications fluidly cross the boundaries between operator graphs, tensor programs, and hardware primitives. In TVM, a single Python program can define a core tensor operation, incorporate a custom hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tir&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relax&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;
+
+&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;script&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ir_module&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyIRModule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
+    &lt;span class=&quot;c1&quot;&gt;# Define a TIR based operation.
+&lt;/span&gt;	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prim_func&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tir_mm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class= [...]
+                   &lt;span class=&quot;n&quot;&gt;W&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+                   &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span cl [...]
+            &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;body&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+                &lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vk&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;sp [...]
+		&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
+            &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
+        &lt;span class=&quot;c1&quot;&gt;# Can be mapped on to HW intrinsics.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt; [...]
+
+	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relax_func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span cl [...]
+        &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+            &lt;span class=&quot;c1&quot;&gt;# Invoke the TIR code.
+&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;lv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span [...]
+            &lt;span class=&quot;n&quot;&gt;lv1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;sp [...]
+            &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lv2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;float32&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt; [...]
+            &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+        &lt;span class=&quot;c1&quot;&gt;# Invoke external update rule.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_packed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;custom_inplace_update&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;p&gt;This code has both a tensor program (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tir_mm&lt;/code&gt;) and computational graph that includes it (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relax_func&lt;/code&gt;). The high-level data flow can directly invoke the low-level tensor manipulation to build up a larger computation. The TVM runtime unifies the operator graph and compiler-based tensor computation to optimize the entire progra [...]
+
+&lt;p&gt;Additionally, TensorIR opens doors to exploit hardware primitives through tensorization. Tensorization transforms loop-level programs to implementations that map onto the primitives that a particular hardware target declares.&lt;/p&gt;
+
+&lt;p&gt;The key to highlight here is &lt;strong&gt;cross layer interactions&lt;/strong&gt;. Our particular example shows interactions between: (1) computational graph and tensor programs; (2) computational graph and runtime libraries; (3) Finally tensor programs and hardware primitives through on-going automatic tensorization developments in TensorIR. These cross layer interactions open doors for making &lt;strong&gt;incremental optimizations&lt;/strong&gt; at the boundary. For example, [...]
+
+&lt;p&gt;In addition to the unification of abstraction layers, we are also working on unifying the shape representation, to enable &lt;strong&gt;first class symbolic shape support&lt;/strong&gt; across the stack. In our particular example, the symbolic shape dimensions(n, m) can flow across the abstractions and enable advanced optimizations for dynamic workloads. The additional capabilities will open doors for both training and inference workload optimizations.&lt;/p&gt;
+
+&lt;h3 id=&quot;unifying-perspectives&quot;&gt;Unifying Perspectives&lt;/h3&gt;
+
+&lt;p&gt;Better ML systems require collaboration between ML scientists, ML engineers, and hardware engineers. The coming era of diverse specialized ML hardware will require coordinated effort from teams that include all three groups. By building rich, bidirectional interfaces between the layers in the system stack, TVM Unity aims to be the medium through which this collaboration and iteration happens.&lt;/p&gt;
+
+&lt;p&gt;Abstractions in TVM can catalyze the lifecycle of an improvement to an AI application. At the highest level, an ML scientist can specify the operator they need to construct the next generation of a model. ML engineers can work at the tensor computation level to make this new operation efficient. Finally, these tensor computations can rely on hardware primitives written by hardware engineers. The work at each level will interact through Python APIs within the TVM ecosystem. The a [...]
+
+&lt;h3 id=&quot;automation&quot;&gt;Automation&lt;/h3&gt;
+
+&lt;p&gt;A unified ML system creates a new, larger search space than a system stack with strict boundaries. Decisions within tensor computations can influence the structure of the operator graph, and new hardware primitives can drastically change the optimal mappings at every other layer.&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will expose all these cross-layer interactions for automated optimization. Finding the best implementation for a given application will require learning-driven optimization: using ML to optimize ML by exploring the expanded joint search space and minimize the computational cost.&lt;/p&gt;
+
+&lt;p&gt;In addition to that, we also want to leverage domain experts’ help when possible, and create mechanisms to effectively incorporate domain information to help guide the automatic optimizations.&lt;/p&gt;
+
+&lt;h2 id=&quot;new-capabilities-with-unity&quot;&gt;New Capabilities with Unity&lt;/h2&gt;
+
+&lt;p&gt;The Unity vision guides the technical roadmap for TVM’s evolution over the next year. The unified approach will position TVM to offer new forms of automation and ecosystem integration that are not possible with today’s system stacks.&lt;/p&gt;
+
+&lt;p&gt;With Unity, TVM will unify library-based computation with compiler-based automation. AI applications will be able to combine the world’s best known code for common operators with automatically optimized code for computations that don’t map neatly onto any existing operator. Developers will be able to smoothly transition between both strategies without a steep “performance cliff” when switching from built-in to generated code. Teams will be able to iterate rapidly with compiled c [...]
+
+&lt;p&gt;TVM also aims to serve as a bridge to unify the broader ML and hardware ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not constrain teams’ choice of frameworks. TVM models will be easy to embed into other frameworks and runtimes as subgraphs for both training and inference. Through exchange formats like &lt;a href=&quot;https://onnx.ai/&quot;&gt;ONNX&lt;/a&gt; and &lt;a href=&quot;https://pytorch.org/docs/stable/jit.html&quot;&gt;TorchScript&lt;/a&gt;,  [...]
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image3.png&quot; alt=&quot;image&quot; style=&quot;width: 50%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Beyond TVM alone, the same forces that are driving TVM Unity exist across the theory and practice of modern ML. Rapid changes to models, emerging alternative hardware, and aging abstraction boundaries all point toward the need for an integrated approach. We expect TVM to lead the way into the next great industry-wide shift in ML systems.&lt;/p&gt;
+
+&lt;p&gt;For more details about our vision for TVM, check out &lt;a href=&quot;https://www.tvmcon.org&quot;&gt;TVMCon 2021&lt;/a&gt; for more talks and discussion.&lt;/p&gt;
+</content>
+ </entry>
+ 
+ <entry>
    <title>Introducing TVM Auto-scheduler (a.k.a. Ansor)</title>
    <link href="https://tvm.apache.org/2021/03/03/intro-auto-scheduler"/>
    <updated>2021-03-03T00:00:00-08:00</updated>
@@ -152,7 +266,7 @@ sparse operators, low-precision operators, and dynamic shape better.&lt;/p&gt;
 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
 &lt;p&gt;When designing accelerators, an important decision is how one will approximately represent real numbers in hardware.
-This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
+This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
 Yet,
   when trying to squeeze
   the most out of hardware
@@ -172,7 +286,7 @@ Due to the lax numerical requirements
   this truncation often has no effect
   on model accuracy,
   while instantly cutting the storage cost
-  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
+  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
 
 &lt;p&gt;Before researchers begin building hardware for their datatype, however, they first need to determine how their datatype will behave numerically in the workloads they care about.
 This often involves first building a software-emulated version of their datatype
@@ -242,7 +356,7 @@ These steps are akin to
   &lt;em&gt;declaration&lt;/em&gt; and &lt;em&gt;implementation&lt;/em&gt; of the datatype,
   respectively.&lt;/p&gt;
 
-&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
+&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
 
 &lt;h3 id=&quot;datatype-registration&quot;&gt;Datatype Registration&lt;/h3&gt;
 
@@ -1126,7 +1240,7 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
                 &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'...'&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;attr_str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;
-            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;f'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}{&lt;/span&gt;&lt [...]
+            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;s [...]
             &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt [...]
             &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args_with_edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]),&lt;/span [...]
@@ -1472,7 +1586,7 @@ A standard µTVM setup, where the host communicates with the device via JTAG.&lt
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graph_runtime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;micro_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt [...]
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CIFAR10_CLASSES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt; [...]
-  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;f'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;Below are the performance results of MicroTVM, compared with &lt;a href=&quot;https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0&quot;&gt;CMSIS-NN version 5.7.0&lt;/a&gt; (commit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a65b7c9a&lt;/code&gt;), a hand-optimized library of ML kernels.&lt;/p&gt;
@@ -4408,231 +4522,5 @@ make jvminstall
 </content>
  </entry>
  
- <entry>
-   <title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
-   <link href="https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm"/>
-   <updated>2017-10-30T00:00:00-07:00</updated>
-   <id>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</id>
-   <content type="html">&lt;p style=&quot;text-align: center&quot;&gt;Aditya Atluri, Advanced Micro Devices, Inc.&lt;/p&gt;
-&lt;p style=&quot;text-align: center&quot;&gt;Masahiro Masuda, Ziosoft, Inc.&lt;/p&gt;
-
-&lt;p&gt;We are pleased to announce a new GPU backend for TVM stack - ROCm backend for AMD GPUs. If you are not familiar with TVM, you can refer to &lt;a href=&quot;http://tvmlang.org/2017/08/17/tvm-release-announcement.html&quot;&gt;the earlier announcement&lt;/a&gt; first. In short, TVM stack is an end to end compilation stack to deploy deep learning workloads to all hardware backends. Today’s announcement focuses on the code generator support for AMD GPUs. Specifically, we developed a [...]
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/tvm_rocm_overview.png&quot; alt=&quot;image&quot; width=&quot;90%&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;TVM stack is developed by an open source community under Apache-2.0 License. ROCm backend support is done with the help from community. Aditya first implemented codegen and runtime. He was later joined by Masahiro. Masahiro’s full time job is not related to TVM or AMD GPUs. Nonetheless, TVM got him excited and he has been involved in fixing bugs, resolving all failing unittests, and adding math function support to codegen.&lt;/p&gt;
-
-&lt;h2 id=&quot;rocm-stack&quot;&gt;ROCm stack&lt;/h2&gt;
-
-&lt;p&gt;Radeon Open Compute is open-source initiative by AMD to leverage compute power of current and future generation GPUs. ROCm software stack is a great tool to express and run most commonly used GPU programming models and achieve peak performance. Not only ROCm is an open-source stack, it is an open stack, which means all the ISA and hardware features are well documented and programmable by developers. Developers can experiment with different programming models and try out multiple [...]
-
-&lt;p&gt;TVM leverages the open-source feature of ROCm stack by using LLVM AMDGPU backend code generator. TVM translates from its intermediate representation (IR) to LLVM intermediate representation. This is the place where ROCm stack open-source feature takes control. TVM’s LLVM AMDGPU CodeGen pass converts LLVM IR into GPU assembly and object code, which is later called to run the whole network or group of layers or single layer.&lt;/p&gt;
-
-&lt;p&gt;On ROCm stack, there is no virtual ISA, you get what you ask for not less not more. Hence, one can schedule operations in a kernel at a granularity of a single instruction, without worrying about instruction reordering and other optimizations you do not ask for.&lt;/p&gt;
-
-&lt;h2 id=&quot;using-nnvm-compiler-with-rocm-backend&quot;&gt;Using NNVM Compiler with ROCm backend&lt;/h2&gt;
-
-&lt;p&gt;Thanks to TVM stack, we can directly compile models from popular deep learning frameworks such as MXNet and PyTorch into AMD GPU assembly using NNVM compiler, today. With ROCm backend, the generic workflow becomes as follows.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/rocm_workflow.png&quot; alt=&quot;image&quot; width=&quot;90%&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;We have put together working examples of compiling models from MXNet and PyTorch with NNVM, and running them on AMD GPUs with ROCm backend. More frameworks are supported via the NNVM compiler stack. The repository is available &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;The script &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/mxnet_imagenet_inference.py&quot;&gt;mxnet_imagenet_inference.py&lt;/a&gt; demonstrates Imagenet inference on AMD GPUs with recently introduced MXNet-Gluon model. It does the following:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Loads Resnet 50 model from &lt;a href=&quot;https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html&quot;&gt;the Gluon model zoo&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Converts Gluon Resnet 50 model to NNVM graph format, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nnvm.frontend.from_mxnet (...)&lt;/code&gt;&lt;/li&gt;
-  &lt;li&gt;Compiles and executes the graph with ROCm backend&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;The example comes with an image of the following cat.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/cat.png&quot; alt=&quot;image&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Running our network, it predicts this image as “tigar cat”, among 1000 categories.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-plain&quot; data-lang=&quot;plain&quot;&gt;$ python mxnet_imagenet_inference.py
-Testing model resnet50_v1
-x (1, 3, 224, 224)
-TVM prediction top-1: 282 tiger cat&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;The script &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/advanced_superres_onnx.py&quot;&gt;advanced_superres_onnx.py&lt;/a&gt; gives an example of loading a model trained with PyTorch. The model is stored in the &lt;a href=&quot;https://onnx.ai/&quot;&gt;ONNX&lt;/a&gt; format. In this example, our network takes an low resolution image as input, and outputs a 4x high resolution image. We refer the details of a problem setup and the network archit [...]
-
-&lt;p&gt;In order to use models in the ONNX format with NNVM, we first use &lt;a href=&quot;https://github.com/onnx/onnx&quot;&gt;the ONNX library&lt;/a&gt; to load the ONNX model into the Protocol buffer object. We can then use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nnvm.frontend.from_onnx(...)&lt;/code&gt; to obtain an equivalent NNVM graph. With a NNVM graph in hand, we can follow the generic workflow of compilation and graph execution outlined above.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/butterfly.png&quot; alt=&quot;image&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;The input to the network is a 64 x 64 image on the left, and it outputs a 256 x 256 image on the right. On the middle is a 256 x 256 image obtained simply by resizing the input image with bicubic interpolation. The network outputs an image of far better quality.&lt;/p&gt;
-
-&lt;p&gt;The input images are taken from the original paper, and they are available &lt;a href=&quot;https://twitter.app.box.com/s/lcue6vlrd01ljkdtdkhmfvk7vtjhetog&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;a-note-on-performance&quot;&gt;A Note on performance&lt;/h2&gt;
-
-&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running &lt;a href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the gemm test script&lt;/a&gt; in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplica [...]
-This is already a promising start, as it is very hard to optimize performance to get to peak and we
-did not yet apply AMD GPU specific optimizations.
-We are starting to look at performance optimization and we expect more improvement to come.&lt;/p&gt;
-
-&lt;h2 id=&quot;walkthrough-of-rocm-backend&quot;&gt;Walkthrough of ROCm backend&lt;/h2&gt;
-
-&lt;p&gt;In the following part of this article we focus on explaining how to use ROCm backend when working with TVM directly. All you need to do is to build your TVM function under the target “rocm” and create a runtime context for it. Here, we show an example of ROCm backend usage, following ‘Vector Add Example’ in TVM’s &lt;a href=&quot;http://docs.tvmlang.org/tutorials/get_started.html#vector-add-example&quot;&gt;getting started tutorial&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;We start by setting up a compute operation and a schedule for the vector add kernel. This step is independent of a backend.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;__future__&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;absolute_import&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;print_function&lt;/span&gt;
-&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm&lt;/span&gt;
-&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;n&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span cl [...]
-&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span cl [...]
-&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&q [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_schedule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;bx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&q [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&q [...]
-
-&lt;p&gt;Next, to use ROCm backend we build our kernel under “rocm” target. This will cause TVM to use our new code generator. We also need a runtime context for ROCm backend.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;rocm&quot;&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class= [...]
-&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;After building the kernel and setting up a runtime context, we can launch our vector add kernel.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-
-&lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;testing&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assert_allclose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asnumpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &l [...]
-
-&lt;p&gt;We can view LLVM IR that TVM generates in the following way:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;imported_modules&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;spa [...]
-&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;llvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;You should see something like this:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-llvm&quot; data-lang=&quot;llvm&quot;&gt;&lt;span class=&quot;c1&quot;&gt;; ModuleID = 'myadd__kernel0'&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;source_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;datalayout&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64&quot;&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;triple&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;amdgcn-amd-amdhsa-hcc&quot;&lt;/span&gt;
-
-
-&lt;span class=&quot;c1&quot;&gt;; Function Attrs: nounwind&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;amdgpu_kernel&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=& [...]
-&lt;span class=&quot;nl&quot;&gt;entry:&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workgroup.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workitem.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%6&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;-127&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%7&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ashr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%8&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;icmp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;slt&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%7&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i1&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_then&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_else&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_then:&lt;/span&gt;                                          &lt;span class=&quot;c1&quot;&gt;; preds = %entry&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%9&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;shl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end.sink.split&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_end.sink.split:&lt;/span&gt;                                &lt;span class=&quot;c1&quot;&gt;; preds = %if_else, %if_then&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;phi&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span [...]
-  &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%12&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%18&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%19&lt;/span [...]
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_end:&lt;/span&gt;                                           &lt;span class=&quot;c1&quot;&gt;; preds = %if_end.sink.split, %if_else&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_else:&lt;/span&gt;                                          &lt;span class=&quot;c1&quot;&gt;; preds = %entry&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%20&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;shl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%22&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;icmp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;slt&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%20&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i1&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end.sink.split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;&lt;span class=&quot;p&quot;& [...]
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;We can also view GPU assembly that ROCm backend generates. This is the real code that runs on your GPU.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;asm&quot;&lt;/span&gt;&lt;sp [...]
-
-&lt;p&gt;The assembly should look something like this, omitting unnecessary details:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-plain&quot; data-lang=&quot;plain&quot;&gt;        s_load_dword s1, s[4:5], 0x18
-        v_mov_b32_e32 v2, -1
-        v_mov_b32_e32 v1, 0
-        s_waitcnt lgkmcnt(0)
-        s_add_i32 s0, s1, 0xffffff81
-        s_ashr_i32 s0, s0, 6
-        s_cmp_ge_i32 s6, s0
-        s_cbranch_scc0 BB0_2
-        v_sub_i32_e32 v1, vcc, s1, v0
-        s_lshl_b32 s0, s6, 6
-        v_cmp_lt_i32_e32 vcc, s0, v1
-        v_mov_b32_e32 v2, 0
-        v_cndmask_b32_e64 v1, 0, -1, vcc
-BB0_2:
-        v_cmp_ne_u32_e32 vcc, 0, v2
-        v_cndmask_b32_e64 v2, 0, 1, vcc
-        v_cmp_ne_u32_e32 vcc, 1, v2
-        s_and_b64 vcc, exec, vcc
-        s_cbranch_vccnz BB0_4
-        s_lshl_b32 s0, s6, 6
-        v_mov_b32_e32 v1, -1
-BB0_4:
-        v_cmp_ne_u32_e32 vcc, 0, v1
-        v_mov_b32_e32 v1, s0
-        s_and_saveexec_b64 s[0:1], vcc
-        s_xor_b64 s[0:1], exec, s[0:1]
-        s_cbranch_execz BB0_6
-BB0_5:
-        s_load_dwordx2 s[2:3], s[4:5], 0x0
-        s_load_dwordx2 s[6:7], s[4:5], 0x8
-        v_add_i32_e32 v0, vcc, v1, v0
-        s_load_dwordx2 s[4:5], s[4:5], 0x10
-        v_ashrrev_i32_e32 v1, 31, v0
-        v_lshlrev_b64 v[0:1], 2, v[0:1]
-        s_waitcnt lgkmcnt(0)
-        v_add_i32_e32 v2, vcc, s4, v0
-        v_mov_b32_e32 v3, s5
-        v_addc_u32_e32 v3, vcc, v3, v1, vcc
-        flat_load_dword v2, v[2:3]
-        v_add_i32_e32 v4, vcc, s6, v0
-        v_mov_b32_e32 v3, s7
-        v_addc_u32_e32 v5, vcc, v3, v1, vcc
-        flat_load_dword v4, v[4:5]
-        v_mov_b32_e32 v3, s3
-        v_add_i32_e32 v0, vcc, s2, v0
-        v_addc_u32_e32 v1, vcc, v3, v1, vcc
-        s_waitcnt vmcnt(0) lgkmcnt(0)
-        v_add_f32_e32 v2, v2, v4
-        flat_store_dword v[0:1], v2
-BB0_6:
-        s_or_b64 exec, exec, s[0:1]
-        s_endpgm&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;Links&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Github page of NNVM Compiler: &lt;a href=&quot;https://github.com/dmlc/nnvm&quot;&gt;https://github.com/dmlc/nnvm&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Github page of TVM: &lt;a href=&quot;https://github.com/dmlc/tvm&quot;&gt;https://github.com/dmlc/tvm&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Examples of ROCm backend with NNVM: &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&quot;&gt;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-</content>
- </entry>
- 
  
 </feed>
diff --git a/blog.html b/blog.html
index ae12173..d4e96a7 100644
--- a/blog.html
+++ b/blog.html
@@ -146,6 +146,16 @@
 
 <li>
   <span>
+    <a class="post-link" href="/2021/12/15/tvm-unity">Apache TVM Unity: a vision for the ML software & hardware ecosystem in 2022</a>
+  </span>
+  </br>
+  <span>
+    Dec 15, 2021
+  </span>
+</li>
+
+<li>
+  <span>
     <a class="post-link" href="/2021/03/03/intro-auto-scheduler">Introducing TVM Auto-scheduler (a.k.a. Ansor)</a>
   </span>
   </br>
diff --git a/feed.xml b/feed.xml
index ee9bf88..5d9b9d2 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,109 @@
-<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2021-11-24T16:21:44-08:00</updated><id>/feed.xml</id><title type="html">TVM</title><author><name>{&quot;name&quot;=&gt;nil}</name></author><entry><title type="html">Introducing TVM Auto-scheduler (a.k.a. Ansor)</tit [...]
+<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2021-12-15T19:32:17-08:00</updated><id>/feed.xml</id><title type="html">TVM</title><author><name>{&quot;name&quot;=&gt;nil}</name></author><entry><title type="html">Apache TVM Unity: a vision for the ML software &am [...]
+
+&lt;h2 id=&quot;boundaries-in-the-modern-ml-system-stack&quot;&gt;Boundaries in the Modern ML System Stack&lt;/h2&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image4.png&quot; alt=&quot;image&quot; style=&quot;width: 40%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;The system stack for modern machine learning consists of four kinds of abstractions:&lt;/p&gt;
+&lt;ol&gt;
+  &lt;li&gt;The &lt;em&gt;computational graph&lt;/em&gt; abstraction encodes the flow of data between coarse-grained tensor operators. Computational graphs are the high-level abstraction users interact with in &lt;a href=&quot;https://www.tensorflow.org/&quot;&gt;TensorFlow&lt;/a&gt;, &lt;a href=&quot;https://mxnet.apache.org/&quot;&gt;MXNet&lt;/a&gt;, and &lt;a href=&quot;https://pytorch.org/&quot;&gt;PyTorch&lt;/a&gt;.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Tensor programs&lt;/em&gt; implement the code for the operators in the computational graph. Deep learning compilers generate the low-level C++ or CUDA code for computations like convolutions or matrix multiplications.&lt;/li&gt;
+  &lt;li&gt;Similarly, &lt;em&gt;libraries and runtimes&lt;/em&gt; include pre-written code to execute and orchestrate tensor operations. BLAS packages and libraries like cuDNN provide extensively tuned operator implementations for specific hardware targets.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Hardware primitives&lt;/em&gt; are at the bottom of the stack. Here, low-level assembly languages and hardware accelerator interfaces expose the raw capabilities of the machine.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;There are &lt;em&gt;vertical&lt;/em&gt; boundaries between the abstraction levels that prohibit cross-layer interactions and feedback between the levels. There is also a &lt;em&gt;horizontal&lt;/em&gt; boundary between two opposing ways that software stacks can treat the central tensor computation level. The horizontal boundary divides &lt;em&gt;library-based&lt;/em&gt; and &lt;em&gt;compilation-based&lt;/em&gt; approaches to tensor computation.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image1.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Library-based frameworks rely on collections of pre-made, carefully tuned operator implementations as their computational workhorse. Compilation-based frameworks instead generate their own custom tensor operation code from scratch.  Modern software stacks typically use one style or the other, but they don’t combine them: most deep learning frameworks are library-based, while most deep learning compilers cannot incorporate libraries and runtimes.&lt;/p&gt;
+
+&lt;p&gt;In the current landscape of ML systems, the boundaries between these layers tend to be strict. Neither approach is better than the other, but they have trade-offs. Library-based stacks excel on standard styles of ML models because they benefit from years of engineering investment common operators. On the other side, the flexibility and automation in compilation-based frameworks can be better for emerging models that require new operators.&lt;/p&gt;
+
+&lt;p&gt;Vertical boundaries exist in both styles of software stack. AI applications start at the top of the stack and march through the layers from top to bottom. Frameworks choose data layout and operator fusion strategies at the graph level; then the tensor computations carry out the operators selected in the computational graph; and these operators map onto a fixed set of hardware primitives. It’s a one-shot, unidirectional workflow: performance constraints at the level of tensor pro [...]
+
+&lt;p&gt;Both vertical and horizontal boundaries are slowing down the pace of innovation in machine learning. New hardware accelerators are emerging with new levels of capability and performance, but harnessing them will require fluid collaboration between ML scientists, ML engineers, hardware vendors that these boundaries prevent. To cope with the rapid pace of change in ML systems, frameworks need to support &lt;strong&gt;incremental&lt;/strong&gt; evolution: Incorporating new capabili [...]
+
+&lt;h2 id=&quot;tvm-unity&quot;&gt;TVM Unity&lt;/h2&gt;
+
+&lt;p&gt;The TVM Unity vision is about breaking down these barriers. The goal is to enable cross-layer interactions and automate their optimization. It is not to collapse the abstraction layers into a monolith: there is no “silver bullet” representation for AI programs that simultaneously enables optimization at every level. Instead, TVM Unity will build interfaces for the abstractions to interact and exchange information.&lt;/p&gt;
+
+&lt;p&gt;Removing the strict barriers between the levels in the system stack will enable new kinds of optimization that work jointly across the layers. A unified view of the entire system will let TVM automatically co-optimize decisions in the computation graph, the tensor operators, and the hardware mapping to search for the best possible implementation of an AI application. At the same time, TVM Unity will also serve as a communication substrate for interactions between ML scientists,  [...]
+
+&lt;h3 id=&quot;unifying-abstractions&quot;&gt;Unifying Abstractions&lt;/h3&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image2.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will focus on letting AI applications fluidly cross the boundaries between operator graphs, tensor programs, and hardware primitives. In TVM, a single Python program can define a core tensor operation, incorporate a custom hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tir&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relax&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;
+
+&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;script&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ir_module&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyIRModule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
+    &lt;span class=&quot;c1&quot;&gt;# Define a TIR based operation.
+&lt;/span&gt;	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prim_func&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tir_mm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class= [...]
+                   &lt;span class=&quot;n&quot;&gt;W&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+                   &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span cl [...]
+            &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;body&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+                &lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vk&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;sp [...]
+		&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
+            &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
+        &lt;span class=&quot;c1&quot;&gt;# Can be mapped on to HW intrinsics.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt; [...]
+
+	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relax_func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span cl [...]
+        &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+            &lt;span class=&quot;c1&quot;&gt;# Invoke the TIR code.
+&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;lv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span [...]
+            &lt;span class=&quot;n&quot;&gt;lv1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;sp [...]
+            &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lv2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;float32&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt; [...]
+            &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+        &lt;span class=&quot;c1&quot;&gt;# Invoke external update rule.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_packed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;custom_inplace_update&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;p&gt;This code has both a tensor program (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tir_mm&lt;/code&gt;) and computational graph that includes it (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relax_func&lt;/code&gt;). The high-level data flow can directly invoke the low-level tensor manipulation to build up a larger computation. The TVM runtime unifies the operator graph and compiler-based tensor computation to optimize the entire progra [...]
+
+&lt;p&gt;Additionally, TensorIR opens doors to exploit hardware primitives through tensorization. Tensorization transforms loop-level programs to implementations that map onto the primitives that a particular hardware target declares.&lt;/p&gt;
+
+&lt;p&gt;The key to highlight here is &lt;strong&gt;cross layer interactions&lt;/strong&gt;. Our particular example shows interactions between: (1) computational graph and tensor programs; (2) computational graph and runtime libraries; (3) Finally tensor programs and hardware primitives through on-going automatic tensorization developments in TensorIR. These cross layer interactions open doors for making &lt;strong&gt;incremental optimizations&lt;/strong&gt; at the boundary. For example, [...]
+
+&lt;p&gt;In addition to the unification of abstraction layers, we are also working on unifying the shape representation, to enable &lt;strong&gt;first class symbolic shape support&lt;/strong&gt; across the stack. In our particular example, the symbolic shape dimensions(n, m) can flow across the abstractions and enable advanced optimizations for dynamic workloads. The additional capabilities will open doors for both training and inference workload optimizations.&lt;/p&gt;
+
+&lt;h3 id=&quot;unifying-perspectives&quot;&gt;Unifying Perspectives&lt;/h3&gt;
+
+&lt;p&gt;Better ML systems require collaboration between ML scientists, ML engineers, and hardware engineers. The coming era of diverse specialized ML hardware will require coordinated effort from teams that include all three groups. By building rich, bidirectional interfaces between the layers in the system stack, TVM Unity aims to be the medium through which this collaboration and iteration happens.&lt;/p&gt;
+
+&lt;p&gt;Abstractions in TVM can catalyze the lifecycle of an improvement to an AI application. At the highest level, an ML scientist can specify the operator they need to construct the next generation of a model. ML engineers can work at the tensor computation level to make this new operation efficient. Finally, these tensor computations can rely on hardware primitives written by hardware engineers. The work at each level will interact through Python APIs within the TVM ecosystem. The a [...]
+
+&lt;h3 id=&quot;automation&quot;&gt;Automation&lt;/h3&gt;
+
+&lt;p&gt;A unified ML system creates a new, larger search space than a system stack with strict boundaries. Decisions within tensor computations can influence the structure of the operator graph, and new hardware primitives can drastically change the optimal mappings at every other layer.&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will expose all these cross-layer interactions for automated optimization. Finding the best implementation for a given application will require learning-driven optimization: using ML to optimize ML by exploring the expanded joint search space and minimize the computational cost.&lt;/p&gt;
+
+&lt;p&gt;In addition to that, we also want to leverage domain experts’ help when possible, and create mechanisms to effectively incorporate domain information to help guide the automatic optimizations.&lt;/p&gt;
+
+&lt;h2 id=&quot;new-capabilities-with-unity&quot;&gt;New Capabilities with Unity&lt;/h2&gt;
+
+&lt;p&gt;The Unity vision guides the technical roadmap for TVM’s evolution over the next year. The unified approach will position TVM to offer new forms of automation and ecosystem integration that are not possible with today’s system stacks.&lt;/p&gt;
+
+&lt;p&gt;With Unity, TVM will unify library-based computation with compiler-based automation. AI applications will be able to combine the world’s best known code for common operators with automatically optimized code for computations that don’t map neatly onto any existing operator. Developers will be able to smoothly transition between both strategies without a steep “performance cliff” when switching from built-in to generated code. Teams will be able to iterate rapidly with compiled c [...]
+
+&lt;p&gt;TVM also aims to serve as a bridge to unify the broader ML and hardware ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not constrain teams’ choice of frameworks. TVM models will be easy to embed into other frameworks and runtimes as subgraphs for both training and inference. Through exchange formats like &lt;a href=&quot;https://onnx.ai/&quot;&gt;ONNX&lt;/a&gt; and &lt;a href=&quot;https://pytorch.org/docs/stable/jit.html&quot;&gt;TorchScript&lt;/a&gt;,  [...]
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image3.png&quot; alt=&quot;image&quot; style=&quot;width: 50%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Beyond TVM alone, the same forces that are driving TVM Unity exist across the theory and practice of modern ML. Rapid changes to models, emerging alternative hardware, and aging abstraction boundaries all point toward the need for an integrated approach. We expect TVM to lead the way into the next great industry-wide shift in ML systems.&lt;/p&gt;
+
+&lt;p&gt;For more details about our vision for TVM, check out &lt;a href=&quot;https://www.tvmcon.org&quot;&gt;TVMCon 2021&lt;/a&gt; for more talks and discussion.&lt;/p&gt;</content><author><name>Adrian Sampson, Tianqi Chen, Jared Roesch</name></author><summary type="html">Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader shift coming in the way that machine learning system stacks optimize for flexibility and agility in the face of a rapidly changing hardware [...]
 model size, operator diversity, and hardware heterogeneity.
 From a computational perspective, deep neural networks are just layers and layers of tensor computations.
 These tensor computations, such as matmul and conv2d, can be easily described by mathematical expressions.
@@ -123,7 +228,7 @@ sparse operators, low-precision operators, and dynamic shape better.&lt;/p&gt;
 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
 &lt;p&gt;When designing accelerators, an important decision is how one will approximately represent real numbers in hardware.
-This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
+This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
 Yet,
   when trying to squeeze
   the most out of hardware
@@ -143,7 +248,7 @@ Due to the lax numerical requirements
   this truncation often has no effect
   on model accuracy,
   while instantly cutting the storage cost
-  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
+  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
 
 &lt;p&gt;Before researchers begin building hardware for their datatype, however, they first need to determine how their datatype will behave numerically in the workloads they care about.
 This often involves first building a software-emulated version of their datatype
@@ -213,7 +318,7 @@ These steps are akin to
   &lt;em&gt;declaration&lt;/em&gt; and &lt;em&gt;implementation&lt;/em&gt; of the datatype,
   respectively.&lt;/p&gt;
 
-&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
+&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
 
 &lt;h3 id=&quot;datatype-registration&quot;&gt;Datatype Registration&lt;/h3&gt;
 
@@ -1077,7 +1182,7 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
                 &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'...'&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;attr_str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;
-            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;f'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}{&lt;/span&gt;&lt [...]
+            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;s [...]
             &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt [...]
             &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args_with_edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]),&lt;/span [...]
@@ -1413,7 +1518,7 @@ A standard µTVM setup, where the host communicates with the device via JTAG.&lt
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graph_runtime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;micro_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt [...]
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CIFAR10_CLASSES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt; [...]
-  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;f'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;Below are the performance results of MicroTVM, compared with &lt;a href=&quot;https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0&quot;&gt;CMSIS-NN version 5.7.0&lt;/a&gt; (commit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a65b7c9a&lt;/code&gt;), a hand-optimized library of ML kernels.&lt;/p&gt;
@@ -2002,165 +2107,4 @@ We show that automatic optimization in TVM makes it easy and flexible to support
 
 &lt;p&gt;We would like to take this chance to thank the Allen School for supporting the SAMPL team that gave birth to the TVM project. We would also like to thank the Halide project which provided the basis for TVM’s loop-level IR and initial code generation. We would like to thank our Apache incubator mentors for introducing the project to Apache and providing useful guidance. Finally, we would like to thank the TVM community and all of the organizations, as listed above, that supported [...]
 
-&lt;p&gt;See also the &lt;a href=&quot;https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/&quot;&gt;Allen School news about the transition here&lt;/a&gt;, &lt;a href=&quot;https://sampl.cs.washington.edu/tvmconf/#about-tvmconf&quot;&gt;TVM conference program slides and recordings&lt;/a&gt;, and &lt;a href=&quot;https://tvm.apache.org/docs//contribute/community.html&quot;&gt;our community guideline here&lt;/a&gt;. Follow us o [...]
-
-&lt;p&gt;TVM is an open deep learning compiler stack to compile various deep learning models from different
-frameworks to CPU, GPU or specialized accelerators.  TVM supports model compilation from a wide range
-of front ends like Tensorflow, Onnx, Keras, Mxnet, Darknet, CoreML and Caffe2. TVM compiled modules
-can be deployed on backends like LLVM (Javascript or WASM, AMD GPU, ARM or X86), NVidia GPU (CUDA),
-OpenCL and Metal.&lt;/p&gt;
-
-&lt;p&gt;TVM supports runtime bindings for programming languages like Javascript, Java, Python, C++… and now Golang.
-With a wide range of frontend, backend and runtime bindings, TVM enables developers to integrate and
-deploy deep learning models from a variety of frameworks to a choice of hardware via many programming languages.&lt;/p&gt;
-
-&lt;p&gt;The TVM import and compilation process generates a graph JSON, a module and a params. Any application that
-integrates the TVM runtime can load these compiled modules and perform inference. A detailed tutorial of module
-import and compilation using TVM can be found at &lt;a href=&quot;https://tvm.apache.org/docs//tutorials/&quot;&gt;tutorials&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;TVM now supports deploying compiled modules through Golang. Golang applications can make use of this
-to deploy the deep learning models through TVM. The scope of this blog is the introduction of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; package,
-the package build process and a sample application using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; to load a compiled module and perform inference.&lt;/p&gt;
-
-&lt;h2 id=&quot;package&quot;&gt;Package&lt;/h2&gt;
-
-&lt;p&gt;The golang package &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; is built on top of TVM’s C runtime interface. The API in this package
-abstracts the native C types and provides Golang compatible types. The package source can be found
-at &lt;a href=&quot;https://github.com/dmlc/tvm/tree/master/golang&quot;&gt;gotvm&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;This package leverages golang’s interface, slices, function closures and implicitly handles the
-necessary conversions across API calls.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/golang/TVM-Golang-Blog.png&quot; alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
-&lt;center&gt; Golang Interface over TVM Runtime &lt;/center&gt;
-&lt;p&gt;&lt;/p&gt;
-
-&lt;h2 id=&quot;how-to&quot;&gt;How to&lt;/h2&gt;
-
-&lt;p&gt;As shown in the below diagram &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; enables golang applications to integrate deep learning models
-from various frameworks without the hassle of understanding each framework related interface API.
-Developers can make use of TVM to import and compile deep learning models and generate TVM artifacts.
-&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; package provides golang friendly API to load, configure, feed input and get output.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/golang/TVM-Golang-Flow.png&quot; alt=&quot;image&quot; width=&quot;100%&quot; /&gt;&lt;/p&gt;
-&lt;center&gt; Import, Compile, Integrate and Deploy&lt;/center&gt;
-&lt;p&gt;&lt;/p&gt;
-
-&lt;p&gt;TVM &lt;a href=&quot;https://tvm.apache.org/docs//tutorials/#compile-deep-learning-models&quot;&gt;Compile Deep Learning Models&lt;/a&gt; tutorials
-are available to compile models from all frameworks supported by the TVM frontend. This compilation process
-generates the artifacts required to integrate and deploy the model on a target.&lt;/p&gt;
-
-&lt;h2 id=&quot;api&quot;&gt;API&lt;/h2&gt;
-
-&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; package provides a handful of datatypes and API functions to initialize, load and infer
-from a golang application. Like any other golang package we just need to import &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; package here.&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Module : The Module API can be used to load a TVM compiled module into TVM runtime and access any functions.&lt;/li&gt;
-  &lt;li&gt;Value : The Value API provides helper functions to set arguments or get return values in golang types like basic types or slices.&lt;/li&gt;
-  &lt;li&gt;Function : The Function API is useful for getting handles to functions and invoking them.&lt;/li&gt;
-  &lt;li&gt;Array : The Array API is useful for setting and getting Tensor data via golang slice.&lt;/li&gt;
-  &lt;li&gt;Context : The Context API contains helper functions to build backend context handles.&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;example&quot;&gt;Example&lt;/h2&gt;
-
-&lt;p&gt;A simple example with inline documentation of loading a compiled module and performing inference is shown below.
-For simplicity the error handling is ignored here, but is important in real applications.&lt;/p&gt;
-
-&lt;div class=&quot;language-cpp highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
-&lt;span class=&quot;n&quot;&gt;package&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// Import compiled gotvm package.&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
-    &lt;span class=&quot;s&quot;&gt;&quot;./gotvm&quot;&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// Some constants for TVM compiled model paths.&lt;/span&gt;
-&lt;span class=&quot;c1&quot;&gt;// modLib : Is the compiled library exported out of compilation.&lt;/span&gt;
-&lt;span class=&quot;c1&quot;&gt;// modJson : TVM graph JSON.&lt;/span&gt;
-&lt;span class=&quot;c1&quot;&gt;// modParams : Exported params out of TVM compilation process.&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;modLib&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./libdeploy.so&quot;&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;modJSON&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./deploy.json&quot;&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;modParams&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./deploy.params&quot;&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-&lt;span class=&quot;c1&quot;&gt;// main&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;func&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
-    &lt;span class=&quot;c1&quot;&gt;// Some util API to query underlying TVM and DLPack version information.&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;fmt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;TVM Version   : v%v&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-    &lt;span class=&quot;n&quot;&gt;fmt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Printf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;DLPACK Version: v%v&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;& [...]
-
-    &lt;span class=&quot;c1&quot;&gt;// Import tvm module (so).&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;modp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;LoadModuleFromFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modLib&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// Load module on tvm runtime - call tvm.graph_runtime.create&lt;/span&gt;
-    &lt;span class=&quot;c1&quot;&gt;// with module and graph JSON.&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ioutil&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReadFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modJSON&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;jsonStr&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;string&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetGlobalFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;tvm.graph_runtime.create&quot;&lt;/span&gt;&lt;span cla [...]
-    &lt;span class=&quot;n&quot;&gt;graphrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;jsonStr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt [...]
-    &lt;span class=&quot;n&quot;&gt;graphmod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graphrt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AsModule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-
-
-    &lt;span class=&quot;c1&quot;&gt;// Allocate input &amp;amp; output arrays and fill some data for input.&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;tshapeIn&lt;/span&gt;  &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;224&lt;/span&gt;&lt [...]
-    &lt;span class=&quot;n&quot;&gt;tshapeOut&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1001&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;inX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tshapeIn&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;sp [...]
-    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gotvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Empty&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tshapeOut&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;inSlice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;make&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;244&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;244&lt;/span&gt [...]
-    &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Seed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Shuffle&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;sp [...]
-                                               &lt;span class=&quot;n&quot;&gt;inSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
-                                               &lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;inX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CopyFrom&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// Load params&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ioutil&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReadFile&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;modParams&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graphmod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;load_params&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;& [...]
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-
-    &lt;span class=&quot;c1&quot;&gt;// Set module input&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graphmod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;set_input&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt [...]
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;input&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// Run or Execute the graph&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graphmod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;run&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt; [...]
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// Get output from runtime.&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graphmod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GetFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;get_output&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&g [...]
-    &lt;span class=&quot;n&quot;&gt;funp&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Invoke&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;int64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// Access output tensor data.&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;outIntf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;AsSlice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-    &lt;span class=&quot;n&quot;&gt;outSlice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outIntf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.([]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;float32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-
-    &lt;span class=&quot;c1&quot;&gt;// outSlice here holds flattened output data as a golang slice.&lt;/span&gt;
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
-
-&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; extends the TVM packed function system to support golang function closures as packed functions.
-&lt;a href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt; available to register golang
-closure as TVM packed function and invoke the same across programming language barriers.&lt;/p&gt;
-
-&lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/src&quot;&gt;Package Source&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;[1] &lt;a href=&quot;https://golang.org&quot;&gt;Go Programming Lang&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[2] &lt;a href=&quot;https://blog.golang.org/godoc-documenting-go-code&quot;&gt;Go Documentation Guide Lines&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[3] &lt;a href=&quot;https://golang.org/pkg/testing&quot;&gt;Go Testcase Framework&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[4] &lt;a href=&quot;https://golang.org/cmd/cgo&quot;&gt;Go CFFI&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[5] &lt;a href=&quot;https://blog.learngoprogramming.com/golang-variadic-funcs-how-to-patterns-369408f19085&quot;&gt;Go Variadic Functions&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[6] &lt;a href=&quot;https://github.com/jdeng/gomxnet&quot;&gt;CFFI Ref&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;[7] &lt;a href=&quot;https://golang.org/pkg/runtime/#SetFinalizer&quot;&gt;Go Finalizers&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;</content><author><name>Siva</name></author><summary type="html">Introduction</summary></entry></feed>
\ No newline at end of file
+&lt;p&gt;See also the &lt;a href=&quot;https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/&quot;&gt;Allen School news about the transition here&lt;/a&gt;, &lt;a href=&quot;https://sampl.cs.washington.edu/tvmconf/#about-tvmconf&quot;&gt;TVM conference program slides and recordings&lt;/a&gt;, and &lt;a href=&quot;https://tvm.apache.org/docs//contribute/community.html&quot;&gt;our community guideline here&lt;/a&gt;. Follow us o [...]
\ No newline at end of file
diff --git a/images/tvm-unity/image1.png b/images/tvm-unity/image1.png
new file mode 100644
index 0000000..616a144
Binary files /dev/null and b/images/tvm-unity/image1.png differ
diff --git a/images/tvm-unity/image2.png b/images/tvm-unity/image2.png
new file mode 100644
index 0000000..a23cd2e
Binary files /dev/null and b/images/tvm-unity/image2.png differ
diff --git a/images/tvm-unity/image3.png b/images/tvm-unity/image3.png
new file mode 100644
index 0000000..4a11da3
Binary files /dev/null and b/images/tvm-unity/image3.png differ
diff --git a/images/tvm-unity/image4.png b/images/tvm-unity/image4.png
new file mode 100644
index 0000000..d8d7657
Binary files /dev/null and b/images/tvm-unity/image4.png differ
diff --git a/rss.xml b/rss.xml
index 0e4aa40..0a697fb 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,12 +5,126 @@
         <description>TVM - </description>
         <link>https://tvm.apache.org</link>
         <atom:link href="https://tvm.apache.org" rel="self" type="application/rss+xml" />
-        <lastBuildDate>Wed, 24 Nov 2021 16:21:44 -0800</lastBuildDate>
-        <pubDate>Wed, 24 Nov 2021 16:21:44 -0800</pubDate>
+        <lastBuildDate>Wed, 15 Dec 2021 19:32:17 -0800</lastBuildDate>
+        <pubDate>Wed, 15 Dec 2021 19:32:17 -0800</pubDate>
         <ttl>60</ttl>
 
 
         <item>
+                <title>Apache TVM Unity: a vision for the ML software &amp; hardware ecosystem in 2022</title>
+                <description>&lt;p&gt;Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader shift coming in the way that machine learning system stacks optimize for flexibility and agility in the face of a rapidly changing hardware landscape. TVM will evolve to break down the boundaries that constrain the ways current ML systems adapt to rapid changes in ML models and the accelerators that implement them.&lt;/p&gt;
+
+&lt;h2 id=&quot;boundaries-in-the-modern-ml-system-stack&quot;&gt;Boundaries in the Modern ML System Stack&lt;/h2&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image4.png&quot; alt=&quot;image&quot; style=&quot;width: 40%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;The system stack for modern machine learning consists of four kinds of abstractions:&lt;/p&gt;
+&lt;ol&gt;
+  &lt;li&gt;The &lt;em&gt;computational graph&lt;/em&gt; abstraction encodes the flow of data between coarse-grained tensor operators. Computational graphs are the high-level abstraction users interact with in &lt;a href=&quot;https://www.tensorflow.org/&quot;&gt;TensorFlow&lt;/a&gt;, &lt;a href=&quot;https://mxnet.apache.org/&quot;&gt;MXNet&lt;/a&gt;, and &lt;a href=&quot;https://pytorch.org/&quot;&gt;PyTorch&lt;/a&gt;.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Tensor programs&lt;/em&gt; implement the code for the operators in the computational graph. Deep learning compilers generate the low-level C++ or CUDA code for computations like convolutions or matrix multiplications.&lt;/li&gt;
+  &lt;li&gt;Similarly, &lt;em&gt;libraries and runtimes&lt;/em&gt; include pre-written code to execute and orchestrate tensor operations. BLAS packages and libraries like cuDNN provide extensively tuned operator implementations for specific hardware targets.&lt;/li&gt;
+  &lt;li&gt;&lt;em&gt;Hardware primitives&lt;/em&gt; are at the bottom of the stack. Here, low-level assembly languages and hardware accelerator interfaces expose the raw capabilities of the machine.&lt;/li&gt;
+&lt;/ol&gt;
+
+&lt;p&gt;There are &lt;em&gt;vertical&lt;/em&gt; boundaries between the abstraction levels that prohibit cross-layer interactions and feedback between the levels. There is also a &lt;em&gt;horizontal&lt;/em&gt; boundary between two opposing ways that software stacks can treat the central tensor computation level. The horizontal boundary divides &lt;em&gt;library-based&lt;/em&gt; and &lt;em&gt;compilation-based&lt;/em&gt; approaches to tensor computation.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image1.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Library-based frameworks rely on collections of pre-made, carefully tuned operator implementations as their computational workhorse. Compilation-based frameworks instead generate their own custom tensor operation code from scratch.  Modern software stacks typically use one style or the other, but they don’t combine them: most deep learning frameworks are library-based, while most deep learning compilers cannot incorporate libraries and runtimes.&lt;/p&gt;
+
+&lt;p&gt;In the current landscape of ML systems, the boundaries between these layers tend to be strict. Neither approach is better than the other, but they have trade-offs. Library-based stacks excel on standard styles of ML models because they benefit from years of engineering investment common operators. On the other side, the flexibility and automation in compilation-based frameworks can be better for emerging models that require new operators.&lt;/p&gt;
+
+&lt;p&gt;Vertical boundaries exist in both styles of software stack. AI applications start at the top of the stack and march through the layers from top to bottom. Frameworks choose data layout and operator fusion strategies at the graph level; then the tensor computations carry out the operators selected in the computational graph; and these operators map onto a fixed set of hardware primitives. It’s a one-shot, unidirectional workflow: performance constraints at the level of tensor pro [...]
+
+&lt;p&gt;Both vertical and horizontal boundaries are slowing down the pace of innovation in machine learning. New hardware accelerators are emerging with new levels of capability and performance, but harnessing them will require fluid collaboration between ML scientists, ML engineers, hardware vendors that these boundaries prevent. To cope with the rapid pace of change in ML systems, frameworks need to support &lt;strong&gt;incremental&lt;/strong&gt; evolution: Incorporating new capabili [...]
+
+&lt;h2 id=&quot;tvm-unity&quot;&gt;TVM Unity&lt;/h2&gt;
+
+&lt;p&gt;The TVM Unity vision is about breaking down these barriers. The goal is to enable cross-layer interactions and automate their optimization. It is not to collapse the abstraction layers into a monolith: there is no “silver bullet” representation for AI programs that simultaneously enables optimization at every level. Instead, TVM Unity will build interfaces for the abstractions to interact and exchange information.&lt;/p&gt;
+
+&lt;p&gt;Removing the strict barriers between the levels in the system stack will enable new kinds of optimization that work jointly across the layers. A unified view of the entire system will let TVM automatically co-optimize decisions in the computation graph, the tensor operators, and the hardware mapping to search for the best possible implementation of an AI application. At the same time, TVM Unity will also serve as a communication substrate for interactions between ML scientists,  [...]
+
+&lt;h3 id=&quot;unifying-abstractions&quot;&gt;Unifying Abstractions&lt;/h3&gt;
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image2.png&quot; alt=&quot;image&quot; style=&quot;width: 70%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will focus on letting AI applications fluidly cross the boundaries between operator graphs, tensor programs, and hardware primitives. In TVM, a single Python program can define a core tensor operation, incorporate a custom hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt;
+&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm.script&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tir&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;relax&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;
+
+&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;script&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ir_module&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyIRModule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
+    &lt;span class=&quot;c1&quot;&gt;# Define a TIR based operation.
+&lt;/span&gt;	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prim_func&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;tir_mm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class= [...]
+                   &lt;span class=&quot;n&quot;&gt;W&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+                   &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Buffer&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt [...]
+        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;  &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;grid&lt;/span&gt;&lt;span cl [...]
+            &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;block&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;body&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
+                &lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vk&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;sp [...]
+		&lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;init&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;():&lt;/span&gt;
+            &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
+        &lt;span class=&quot;c1&quot;&gt;# Can be mapped on to HW intrinsics.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;Y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vj&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;X&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vi&lt;/span&gt;&lt; [...]
+
+	&lt;span class=&quot;o&quot;&gt;@&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;
+	&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;relax_func&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span cl [...]
+        &lt;span class=&quot;k&quot;&gt;with&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dataflow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
+            &lt;span class=&quot;c1&quot;&gt;# Invoke the TIR code.
+&lt;/span&gt;            &lt;span class=&quot;n&quot;&gt;lv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span [...]
+            &lt;span class=&quot;n&quot;&gt;lv1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;m&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;sp [...]
+            &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tensor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lv2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;float32&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt; [...]
+            &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+
+        &lt;span class=&quot;c1&quot;&gt;# Invoke external update rule.
+&lt;/span&gt;        &lt;span class=&quot;n&quot;&gt;R&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;call_packed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;custom_inplace_update&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;gv0&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
+
+&lt;p&gt;This code has both a tensor program (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tir_mm&lt;/code&gt;) and computational graph that includes it (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;relax_func&lt;/code&gt;). The high-level data flow can directly invoke the low-level tensor manipulation to build up a larger computation. The TVM runtime unifies the operator graph and compiler-based tensor computation to optimize the entire progra [...]
+
+&lt;p&gt;Additionally, TensorIR opens doors to exploit hardware primitives through tensorization. Tensorization transforms loop-level programs to implementations that map onto the primitives that a particular hardware target declares.&lt;/p&gt;
+
+&lt;p&gt;The key to highlight here is &lt;strong&gt;cross layer interactions&lt;/strong&gt;. Our particular example shows interactions between: (1) computational graph and tensor programs; (2) computational graph and runtime libraries; (3) Finally tensor programs and hardware primitives through on-going automatic tensorization developments in TensorIR. These cross layer interactions open doors for making &lt;strong&gt;incremental optimizations&lt;/strong&gt; at the boundary. For example, [...]
+
+&lt;p&gt;In addition to the unification of abstraction layers, we are also working on unifying the shape representation, to enable &lt;strong&gt;first class symbolic shape support&lt;/strong&gt; across the stack. In our particular example, the symbolic shape dimensions(n, m) can flow across the abstractions and enable advanced optimizations for dynamic workloads. The additional capabilities will open doors for both training and inference workload optimizations.&lt;/p&gt;
+
+&lt;h3 id=&quot;unifying-perspectives&quot;&gt;Unifying Perspectives&lt;/h3&gt;
+
+&lt;p&gt;Better ML systems require collaboration between ML scientists, ML engineers, and hardware engineers. The coming era of diverse specialized ML hardware will require coordinated effort from teams that include all three groups. By building rich, bidirectional interfaces between the layers in the system stack, TVM Unity aims to be the medium through which this collaboration and iteration happens.&lt;/p&gt;
+
+&lt;p&gt;Abstractions in TVM can catalyze the lifecycle of an improvement to an AI application. At the highest level, an ML scientist can specify the operator they need to construct the next generation of a model. ML engineers can work at the tensor computation level to make this new operation efficient. Finally, these tensor computations can rely on hardware primitives written by hardware engineers. The work at each level will interact through Python APIs within the TVM ecosystem. The a [...]
+
+&lt;h3 id=&quot;automation&quot;&gt;Automation&lt;/h3&gt;
+
+&lt;p&gt;A unified ML system creates a new, larger search space than a system stack with strict boundaries. Decisions within tensor computations can influence the structure of the operator graph, and new hardware primitives can drastically change the optimal mappings at every other layer.&lt;/p&gt;
+
+&lt;p&gt;TVM Unity will expose all these cross-layer interactions for automated optimization. Finding the best implementation for a given application will require learning-driven optimization: using ML to optimize ML by exploring the expanded joint search space and minimize the computational cost.&lt;/p&gt;
+
+&lt;p&gt;In addition to that, we also want to leverage domain experts’ help when possible, and create mechanisms to effectively incorporate domain information to help guide the automatic optimizations.&lt;/p&gt;
+
+&lt;h2 id=&quot;new-capabilities-with-unity&quot;&gt;New Capabilities with Unity&lt;/h2&gt;
+
+&lt;p&gt;The Unity vision guides the technical roadmap for TVM’s evolution over the next year. The unified approach will position TVM to offer new forms of automation and ecosystem integration that are not possible with today’s system stacks.&lt;/p&gt;
+
+&lt;p&gt;With Unity, TVM will unify library-based computation with compiler-based automation. AI applications will be able to combine the world’s best known code for common operators with automatically optimized code for computations that don’t map neatly onto any existing operator. Developers will be able to smoothly transition between both strategies without a steep “performance cliff” when switching from built-in to generated code. Teams will be able to iterate rapidly with compiled c [...]
+
+&lt;p&gt;TVM also aims to serve as a bridge to unify the broader ML and hardware ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not constrain teams’ choice of frameworks. TVM models will be easy to embed into other frameworks and runtimes as subgraphs for both training and inference. Through exchange formats like &lt;a href=&quot;https://onnx.ai/&quot;&gt;ONNX&lt;/a&gt; and &lt;a href=&quot;https://pytorch.org/docs/stable/jit.html&quot;&gt;TorchScript&lt;/a&gt;,  [...]
+
+&lt;p&gt;&lt;img src=&quot;/images/tvm-unity/image3.png&quot; alt=&quot;image&quot; style=&quot;width: 50%; margin: auto; display: block;&quot; /&gt;&lt;/p&gt;
+
+&lt;p&gt;Beyond TVM alone, the same forces that are driving TVM Unity exist across the theory and practice of modern ML. Rapid changes to models, emerging alternative hardware, and aging abstraction boundaries all point toward the need for an integrated approach. We expect TVM to lead the way into the next great industry-wide shift in ML systems.&lt;/p&gt;
+
+&lt;p&gt;For more details about our vision for TVM, check out &lt;a href=&quot;https://www.tvmcon.org&quot;&gt;TVMCon 2021&lt;/a&gt; for more talks and discussion.&lt;/p&gt;
+</description>
+                <link>https://tvm.apache.org/2021/12/15/tvm-unity</link>
+                <guid>https://tvm.apache.org/2021/12/15/tvm-unity</guid>
+                <pubDate>Wed, 15 Dec 2021 00:00:00 -0800</pubDate>
+        </item>
+
+        <item>
                 <title>Introducing TVM Auto-scheduler (a.k.a. Ansor)</title>
                 <description>&lt;p&gt;Optimizing the execution speed of deep neural networks is extremely hard with the growing
 model size, operator diversity, and hardware heterogeneity.
@@ -147,7 +261,7 @@ sparse operators, low-precision operators, and dynamic shape better.&lt;/p&gt;
 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
 &lt;p&gt;When designing accelerators, an important decision is how one will approximately represent real numbers in hardware.
-This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
+This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.&lt;sup id=&quot;fnref:ieee&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:ieee&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;
 Yet,
   when trying to squeeze
   the most out of hardware
@@ -167,7 +281,7 @@ Due to the lax numerical requirements
   this truncation often has no effect
   on model accuracy,
   while instantly cutting the storage cost
-  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
+  in half.&lt;sup id=&quot;fnref:jouppi2017datacenter&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:jouppi2017datacenter&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;sup id=&quot;fnref:tensorflowbfloat&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:tensorflowbfloat&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
 
 &lt;p&gt;Before researchers begin building hardware for their datatype, however, they first need to determine how their datatype will behave numerically in the workloads they care about.
 This often involves first building a software-emulated version of their datatype
@@ -237,7 +351,7 @@ These steps are akin to
   &lt;em&gt;declaration&lt;/em&gt; and &lt;em&gt;implementation&lt;/em&gt; of the datatype,
   respectively.&lt;/p&gt;
 
-&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
+&lt;p&gt;Please note that all referred code in this post are based on TVM repository’s master branch commit &lt;a href=&quot;https://github.com/apache/incubator-tvm/tree/4cad71d19fda6d8f7b750c791284c6dfdddf1f07&quot; target=&quot;_blank&quot;&gt;4cad71d&lt;/a&gt;. We will use an example &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;posit&lt;/code&gt; datatype which can be found under &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;src/target/dataty [...]
 
 &lt;h3 id=&quot;datatype-registration&quot;&gt;Datatype Registration&lt;/h3&gt;
 
@@ -1121,7 +1235,7 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
                 &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                     &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'...'&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;attr_str&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;
-            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;f'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}{&lt;/span&gt;&lt [...]
+            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg_str&lt;/span&gt;&lt;s [...]
             &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt [...]
             &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args_with_edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
                 &lt;span class=&quot;n&quot;&gt;dot&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;node_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]),&lt;/span [...]
@@ -1467,7 +1581,7 @@ A standard µTVM setup, where the host communicates with the device via JTAG.&lt
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graph_runtime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;micro_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt [...]
   &lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data_np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CIFAR10_CLASSES&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;argmax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;graph_mod&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt; [...]
-  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;f'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+  &lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sa&quot;&gt;f&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'prediction was &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;Below are the performance results of MicroTVM, compared with &lt;a href=&quot;https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0&quot;&gt;CMSIS-NN version 5.7.0&lt;/a&gt; (commit &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a65b7c9a&lt;/code&gt;), a hand-optimized library of ML kernels.&lt;/p&gt;
@@ -4406,232 +4520,6 @@ make jvminstall
                 <pubDate>Wed, 08 Nov 2017 00:00:00 -0800</pubDate>
         </item>
 
-        <item>
-                <title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
-                <description>&lt;p style=&quot;text-align: center&quot;&gt;Aditya Atluri, Advanced Micro Devices, Inc.&lt;/p&gt;
-&lt;p style=&quot;text-align: center&quot;&gt;Masahiro Masuda, Ziosoft, Inc.&lt;/p&gt;
-
-&lt;p&gt;We are pleased to announce a new GPU backend for TVM stack - ROCm backend for AMD GPUs. If you are not familiar with TVM, you can refer to &lt;a href=&quot;http://tvmlang.org/2017/08/17/tvm-release-announcement.html&quot;&gt;the earlier announcement&lt;/a&gt; first. In short, TVM stack is an end to end compilation stack to deploy deep learning workloads to all hardware backends. Today’s announcement focuses on the code generator support for AMD GPUs. Specifically, we developed a [...]
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/tvm_rocm_overview.png&quot; alt=&quot;image&quot; width=&quot;90%&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;TVM stack is developed by an open source community under Apache-2.0 License. ROCm backend support is done with the help from community. Aditya first implemented codegen and runtime. He was later joined by Masahiro. Masahiro’s full time job is not related to TVM or AMD GPUs. Nonetheless, TVM got him excited and he has been involved in fixing bugs, resolving all failing unittests, and adding math function support to codegen.&lt;/p&gt;
-
-&lt;h2 id=&quot;rocm-stack&quot;&gt;ROCm stack&lt;/h2&gt;
-
-&lt;p&gt;Radeon Open Compute is open-source initiative by AMD to leverage compute power of current and future generation GPUs. ROCm software stack is a great tool to express and run most commonly used GPU programming models and achieve peak performance. Not only ROCm is an open-source stack, it is an open stack, which means all the ISA and hardware features are well documented and programmable by developers. Developers can experiment with different programming models and try out multiple [...]
-
-&lt;p&gt;TVM leverages the open-source feature of ROCm stack by using LLVM AMDGPU backend code generator. TVM translates from its intermediate representation (IR) to LLVM intermediate representation. This is the place where ROCm stack open-source feature takes control. TVM’s LLVM AMDGPU CodeGen pass converts LLVM IR into GPU assembly and object code, which is later called to run the whole network or group of layers or single layer.&lt;/p&gt;
-
-&lt;p&gt;On ROCm stack, there is no virtual ISA, you get what you ask for not less not more. Hence, one can schedule operations in a kernel at a granularity of a single instruction, without worrying about instruction reordering and other optimizations you do not ask for.&lt;/p&gt;
-
-&lt;h2 id=&quot;using-nnvm-compiler-with-rocm-backend&quot;&gt;Using NNVM Compiler with ROCm backend&lt;/h2&gt;
-
-&lt;p&gt;Thanks to TVM stack, we can directly compile models from popular deep learning frameworks such as MXNet and PyTorch into AMD GPU assembly using NNVM compiler, today. With ROCm backend, the generic workflow becomes as follows.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/rocm_workflow.png&quot; alt=&quot;image&quot; width=&quot;90%&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;We have put together working examples of compiling models from MXNet and PyTorch with NNVM, and running them on AMD GPUs with ROCm backend. More frameworks are supported via the NNVM compiler stack. The repository is available &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;The script &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/mxnet_imagenet_inference.py&quot;&gt;mxnet_imagenet_inference.py&lt;/a&gt; demonstrates Imagenet inference on AMD GPUs with recently introduced MXNet-Gluon model. It does the following:&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Loads Resnet 50 model from &lt;a href=&quot;https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html&quot;&gt;the Gluon model zoo&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Converts Gluon Resnet 50 model to NNVM graph format, using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nnvm.frontend.from_mxnet (...)&lt;/code&gt;&lt;/li&gt;
-  &lt;li&gt;Compiles and executes the graph with ROCm backend&lt;/li&gt;
-&lt;/ul&gt;
-
-&lt;p&gt;The example comes with an image of the following cat.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/cat.png&quot; alt=&quot;image&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;Running our network, it predicts this image as “tigar cat”, among 1000 categories.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-plain&quot; data-lang=&quot;plain&quot;&gt;$ python mxnet_imagenet_inference.py
-Testing model resnet50_v1
-x (1, 3, 224, 224)
-TVM prediction top-1: 282 tiger cat&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;The script &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/advanced_superres_onnx.py&quot;&gt;advanced_superres_onnx.py&lt;/a&gt; gives an example of loading a model trained with PyTorch. The model is stored in the &lt;a href=&quot;https://onnx.ai/&quot;&gt;ONNX&lt;/a&gt; format. In this example, our network takes an low resolution image as input, and outputs a 4x high resolution image. We refer the details of a problem setup and the network archit [...]
-
-&lt;p&gt;In order to use models in the ONNX format with NNVM, we first use &lt;a href=&quot;https://github.com/onnx/onnx&quot;&gt;the ONNX library&lt;/a&gt; to load the ONNX model into the Protocol buffer object. We can then use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;nnvm.frontend.from_onnx(...)&lt;/code&gt; to obtain an equivalent NNVM graph. With a NNVM graph in hand, we can follow the generic workflow of compilation and graph execution outlined above.&lt;/p&gt;
-
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img src=&quot;/images/rocm/butterfly.png&quot; alt=&quot;image&quot; /&gt;&lt;/p&gt;
-
-&lt;p&gt;The input to the network is a 64 x 64 image on the left, and it outputs a 256 x 256 image on the right. On the middle is a 256 x 256 image obtained simply by resizing the input image with bicubic interpolation. The network outputs an image of far better quality.&lt;/p&gt;
-
-&lt;p&gt;The input images are taken from the original paper, and they are available &lt;a href=&quot;https://twitter.app.box.com/s/lcue6vlrd01ljkdtdkhmfvk7vtjhetog&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
-
-&lt;h2 id=&quot;a-note-on-performance&quot;&gt;A Note on performance&lt;/h2&gt;
-
-&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running &lt;a href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the gemm test script&lt;/a&gt; in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplica [...]
-This is already a promising start, as it is very hard to optimize performance to get to peak and we
-did not yet apply AMD GPU specific optimizations.
-We are starting to look at performance optimization and we expect more improvement to come.&lt;/p&gt;
-
-&lt;h2 id=&quot;walkthrough-of-rocm-backend&quot;&gt;Walkthrough of ROCm backend&lt;/h2&gt;
-
-&lt;p&gt;In the following part of this article we focus on explaining how to use ROCm backend when working with TVM directly. All you need to do is to build your TVM function under the target “rocm” and create a runtime context for it. Here, we show an example of ROCm backend usage, following ‘Vector Add Example’ in TVM’s &lt;a href=&quot;http://docs.tvmlang.org/tutorials/get_started.html#vector-add-example&quot;&gt;getting started tutorial&lt;/a&gt;.&lt;/p&gt;
-
-&lt;p&gt;We start by setting up a compute operation and a schedule for the vector add kernel. This step is independent of a backend.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;__future__&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;absolute_import&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;print_function&lt;/span&gt;
-&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;tvm&lt;/span&gt;
-&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;
-
-&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;n&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span cl [...]
-&lt;span class=&quot;n&quot;&gt;B&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;placeholder&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span cl [...]
-&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;shape&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&q [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_schedule&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;op&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;bx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&q [...]
-&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bind&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tx&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&q [...]
-
-&lt;p&gt;Next, to use ROCm backend we build our kernel under “rocm” target. This will cause TVM to use our new code generator. We also need a runtime context for ROCm backend.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;rocm&quot;&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;A&lt;/span&gt;&lt;span class= [...]
-&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;After building the kernel and setting up a runtime context, we can launch our vector add kernel.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1024&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tvm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;array&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n [...]
-
-&lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
-&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;testing&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;assert_allclose&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;asnumpy&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &l [...]
-
-&lt;p&gt;We can view LLVM IR that TVM generates in the following way:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fadd_rocm&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;imported_modules&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;spa [...]
-&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;llvm&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;You should see something like this:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-llvm&quot; data-lang=&quot;llvm&quot;&gt;&lt;span class=&quot;c1&quot;&gt;; ModuleID = 'myadd__kernel0'&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;source_filename&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;myadd__kernel0&quot;&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;datalayout&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64&quot;&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;triple&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;amdgcn-amd-amdhsa-hcc&quot;&lt;/span&gt;
-
-
-&lt;span class=&quot;c1&quot;&gt;; Function Attrs: nounwind&lt;/span&gt;
-&lt;span class=&quot;k&quot;&gt;define&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;dllexport&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;amdgpu_kernel&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@myadd__kernel0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=& [...]
-&lt;span class=&quot;nl&quot;&gt;entry:&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workgroup.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;tail&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;call&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;vg&quot;&gt;@llvm.amdgcn.workitem.id.x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%6&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;-127&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%7&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ashr&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%6&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%8&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;icmp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;slt&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%7&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i1&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_then&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_else&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_then:&lt;/span&gt;                                          &lt;span class=&quot;c1&quot;&gt;; preds = %entry&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%9&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;shl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end.sink.split&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_end.sink.split:&lt;/span&gt;                                &lt;span class=&quot;c1&quot;&gt;; preds = %if_else, %if_then&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;phi&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span [...]
-  &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%.pre-phi&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%12&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%11&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%13&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%15&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;load&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; [...]
-  &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%14&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%16&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%18&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sext&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%10&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i64&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%19&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;getelementptr&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;inbounds&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&g [...]
-  &lt;span class=&quot;k&quot;&gt;store&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%17&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;float&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;addrspace&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;m&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)*&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%19&lt;/span [...]
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_end:&lt;/span&gt;                                           &lt;span class=&quot;c1&quot;&gt;; preds = %if_end.sink.split, %if_else&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;
-
-
-&lt;span class=&quot;nl&quot;&gt;if_else:&lt;/span&gt;                                          &lt;span class=&quot;c1&quot;&gt;; preds = %entry&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%20&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;sub&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%5&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;shl&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;nsw&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;m&quot;&gt;6&lt;/span&gt;
-  &lt;span class=&quot;nv&quot;&gt;%22&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;icmp&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;slt&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%21&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%20&lt;/span&gt;
-  &lt;span class=&quot;k&quot;&gt;br&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i1&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%22&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end.sink.split&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;label&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;%if_end&lt;/span&gt;&lt;span class=&quot;p&quot;& [...]
-&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;We can also view GPU assembly that ROCm backend generates. This is the real code that runs on your GPU.&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot; data-lang=&quot;python&quot;&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev_module&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_source&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;asm&quot;&lt;/span&gt;&lt;sp [...]
-
-&lt;p&gt;The assembly should look something like this, omitting unnecessary details:&lt;/p&gt;
-
-&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-plain&quot; data-lang=&quot;plain&quot;&gt;        s_load_dword s1, s[4:5], 0x18
-        v_mov_b32_e32 v2, -1
-        v_mov_b32_e32 v1, 0
-        s_waitcnt lgkmcnt(0)
-        s_add_i32 s0, s1, 0xffffff81
-        s_ashr_i32 s0, s0, 6
-        s_cmp_ge_i32 s6, s0
-        s_cbranch_scc0 BB0_2
-        v_sub_i32_e32 v1, vcc, s1, v0
-        s_lshl_b32 s0, s6, 6
-        v_cmp_lt_i32_e32 vcc, s0, v1
-        v_mov_b32_e32 v2, 0
-        v_cndmask_b32_e64 v1, 0, -1, vcc
-BB0_2:
-        v_cmp_ne_u32_e32 vcc, 0, v2
-        v_cndmask_b32_e64 v2, 0, 1, vcc
-        v_cmp_ne_u32_e32 vcc, 1, v2
-        s_and_b64 vcc, exec, vcc
-        s_cbranch_vccnz BB0_4
-        s_lshl_b32 s0, s6, 6
-        v_mov_b32_e32 v1, -1
-BB0_4:
-        v_cmp_ne_u32_e32 vcc, 0, v1
-        v_mov_b32_e32 v1, s0
-        s_and_saveexec_b64 s[0:1], vcc
-        s_xor_b64 s[0:1], exec, s[0:1]
-        s_cbranch_execz BB0_6
-BB0_5:
-        s_load_dwordx2 s[2:3], s[4:5], 0x0
-        s_load_dwordx2 s[6:7], s[4:5], 0x8
-        v_add_i32_e32 v0, vcc, v1, v0
-        s_load_dwordx2 s[4:5], s[4:5], 0x10
-        v_ashrrev_i32_e32 v1, 31, v0
-        v_lshlrev_b64 v[0:1], 2, v[0:1]
-        s_waitcnt lgkmcnt(0)
-        v_add_i32_e32 v2, vcc, s4, v0
-        v_mov_b32_e32 v3, s5
-        v_addc_u32_e32 v3, vcc, v3, v1, vcc
-        flat_load_dword v2, v[2:3]
-        v_add_i32_e32 v4, vcc, s6, v0
-        v_mov_b32_e32 v3, s7
-        v_addc_u32_e32 v5, vcc, v3, v1, vcc
-        flat_load_dword v4, v[4:5]
-        v_mov_b32_e32 v3, s3
-        v_add_i32_e32 v0, vcc, s2, v0
-        v_addc_u32_e32 v1, vcc, v3, v1, vcc
-        s_waitcnt vmcnt(0) lgkmcnt(0)
-        v_add_f32_e32 v2, v2, v4
-        flat_store_dword v[0:1], v2
-BB0_6:
-        s_or_b64 exec, exec, s[0:1]
-        s_endpgm&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
-
-&lt;p&gt;Links&lt;/p&gt;
-
-&lt;ul&gt;
-  &lt;li&gt;Github page of NNVM Compiler: &lt;a href=&quot;https://github.com/dmlc/nnvm&quot;&gt;https://github.com/dmlc/nnvm&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Github page of TVM: &lt;a href=&quot;https://github.com/dmlc/tvm&quot;&gt;https://github.com/dmlc/tvm&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;Examples of ROCm backend with NNVM: &lt;a href=&quot;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&quot;&gt;https://github.com/ROCmSoftwarePlatform/nnvm-rocm&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;
-</description>
-                <link>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</link>
-                <guid>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</guid>
-                <pubDate>Mon, 30 Oct 2017 00:00:00 -0700</pubDate>
-        </item>
-
 
 </channel>
 </rss>
diff --git a/sitemap.txt b/sitemap.txt
index db8795d..a9a602d 100644
--- a/sitemap.txt
+++ b/sitemap.txt
@@ -16,6 +16,7 @@ https://tvm.apache.org/vta
 https://tvm.apache.org/feed.xml
 https://tvm.apache.org/css/custom.css.map
 
+https://tvm.apache.org/2021/12/15/tvm-unity
 https://tvm.apache.org/2021/03/03/intro-auto-scheduler
 https://tvm.apache.org/2020/09/26/bring-your-own-datatypes
 https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm