You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by lm...@apache.org on 2020/11/13 11:22:24 UTC

[incubator-tvm-site] branch asf-site updated: Build at Fri Nov 13 03:22:10 PST 2020

This is an automated email from the ASF dual-hosted git repository.

lmzheng pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new ef48c37  Build at Fri Nov 13 03:22:10 PST 2020
ef48c37 is described below

commit ef48c37e3fc749868336eb80b98371fed51241f0
Author: Lianmin Zheng <li...@gmail.com>
AuthorDate: Fri Nov 13 03:22:10 2020 -0800

    Build at Fri Nov 13 03:22:10 PST 2020
---
 2017/08/17/tvm-release-announcement.html           |  280 +-
 ...s-with-TVM-A-Depthwise-Convolution-Example.html |  519 ++--
 2017/10/06/nnvm-compiler-announcement.html         |  280 +-
 ...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html |  338 +--
 2017/11/08/android-rpc-introduction.html           |  449 +--
 2018/01/16/opt-mali-gpu.html                       |  496 ++--
 2018/03/12/webgl.html                              |  280 +-
 2018/03/23/nmt-transformer-optimize.html           |  319 +-
 2018/07/12/vta-release-announcement.html           |  290 +-
 2018/08/10/DLPack-Bridge.html                      |  362 +--
 2018/10/03/auto-opt-all.html                       |  294 +-
 2018/10/09/ml-in-tees.html                         |  280 +-
 2018/12/18/lowprecision-conv.html                  |  315 +-
 2019/01/19/Golang.html                             |  367 +--
 2019/03/18/tvm-apache-announcement.html            |  328 +--
 2019/04/29/opt-cuda-quantized.html                 |  339 +--
 2019/05/30/pytorch-frontend.html                   |  305 +-
 ...machine-learning-to-webassembly-and-webgpu.html |  280 +-
 2020/05/20/bring-your-own-datatypes.html           |  485 ++++
 2020/06/04/tinyml-how-tvm-is-taming-tiny.html      |  420 +--
 2020/07/14/bert-pytorch-tvm.html                   |  638 ++--
 .../15/how-to-bring-your-own-codegen-to-tvm.html   |  619 ++--
 2020/09/26/bring-your-own-datatypes.html           |  483 ----
 404.html                                           |   73 +-
 about.html                                         |  226 +-
 asf.html                                           |  324 ++-
 assets/images/Flexibility.svg                      |    3 -
 assets/images/about-image.svg                      |   91 -
 assets/images/about-responsive-image.svg           |  413 ---
 assets/images/banner-bg.jpg                        |  Bin 327674 -> 0 bytes
 assets/images/close-icon.svg                       |    3 -
 assets/images/communitybg.svg                      |  315 --
 assets/images/dropdown-icon.svg                    |    3 -
 assets/images/favicon.ico                          |  Bin 15406 -> 0 bytes
 assets/images/flexbg.svg                           |  138 -
 assets/images/leftslide.svg                        |    3 -
 assets/images/logo.svg                             |    9 -
 assets/images/menu-icon.svg                        |    5 -
 assets/images/modal-close-icon.svg                 |    3 -
 assets/images/pattern.png                          |  Bin 84215 -> 0 bytes
 assets/images/performance.svg                      |  206 --
 assets/images/right.svg                            |    3 -
 assets/images/run.svg                              |    3 -
 assets/images/runbg.svg                            |  212 --
 assets/images/speed.svg                            |    7 -
 assets/images/use.svg                              |    3 -
 assets/images/usebg.svg                            |  216 --
 assets/js/custome.js                               |   27 -
 assets/js/slick.js                                 | 3037 --------------------
 .../bootstrap/css/bootstrap.2.2.2.min.css          |  782 +++++
 .../bootstrap/img/glyphicons-halflings-white.png   |  Bin 0 -> 8777 bytes
 .../bootstrap/img/glyphicons-halflings.png         |  Bin 0 -> 12799 bytes
 .../themes/custom-twitter/css/1.4.0/bootstrap.css  |  356 +++
 assets/themes/custom-twitter/css/style.css         |  428 +++
 atom.xml                                           | 2313 ++++++++-------
 blog.html                                          |  292 +-
 categories.html                                    |  288 +-
 community.html                                     |  522 ++--
 css/custom.css                                     |  581 ----
 css/custom.css.map                                 |   12 -
 css/slick-theme.css                                |  186 --
 css/slick.css                                      |  119 -
 css/variables.scss                                 |    3 -
 download.html                                      |  315 +-
 feed.xml                                           | 2197 --------------
 history.html                                       |  197 --
 images/bring-your-own-datatypes/lowering.png       |  Bin 26104 -> 155592 bytes
 images/community/alicloud.png                      |  Bin 3461 -> 20301 bytes
 images/community/amd.png                           |  Bin 1287 -> 8823 bytes
 images/community/arm.png                           |  Bin 1401 -> 2050 bytes
 images/community/aws.png                           |  Bin 3365 -> 32934 bytes
 images/community/edgecortix.png                    |  Bin 3824 -> 94207 bytes
 images/community/facebookopen.png                  |  Bin 6853 -> 0 bytes
 images/community/huawei.png                        |  Bin 4222 -> 7913 bytes
 images/community/intel.png                         |  Bin 2843 -> 4489 bytes
 images/community/itri_tw.png                       |  Bin 77552 -> 0 bytes
 images/community/microsoft.png                     |  Bin 2158 -> 24164 bytes
 images/community/nvidia.png                        |  Bin 2408 -> 48816 bytes
 images/community/oasislabs.png                     |  Bin 3652 -> 37771 bytes
 images/community/octoml.png                        |  Bin 2705 -> 0 bytes
 images/community/qualcommic.png                    |  Bin 4370 -> 32799 bytes
 images/community/ucberkeley.png                    |  Bin 4101 -> 32071 bytes
 images/community/ucla.png                          |  Bin 26748 -> 45279 bytes
 images/community/uwcse.png                         |  Bin 4791 -> 19344 bytes
 images/community/xilinx.png                        |  Bin 2071 -> 12702 bytes
 index.html                                         |  379 +--
 rss.xml                                            | 2319 ++++++++-------
 sitemap.txt                                        |    8 +-
 tags.html                                          |  288 +-
 vta.html                                           |  290 +-
 90 files changed, 9988 insertions(+), 15976 deletions(-)

diff --git a/2017/08/17/tvm-release-announcement.html b/2017/08/17/tvm-release-announcement.html
index 9b83eb3..2271f91 100644
--- a/2017/08/17/tvm-release-announcement.html
+++ b/2017/08/17/tvm-release-announcement.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on Hardware Platforms </h1>
       <p class="post-meta">
-        <time datetime="2017-08-17T15:00:00-04:00" itemprop="datePublished">
+        <time datetime="2017-08-17T12:00:00-07:00" itemprop="datePublished">
           Aug 17, 2017
         </time>
         
@@ -274,41 +284,37 @@ that adopts the standard, such as MXNet, PyTorch, Caffe2 and tiny-dnn.</li>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
index a03a6bf..514ed23 100644
--- a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
+++ b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Optimize Deep Learning GPU Operators with TVM: A Depthwise Convolution Example</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Optimize Deep Learning GPU Operators with TVM: A Depthwise Convolution Example </h1>
       <p class="post-meta">
-        <time datetime="2017-08-22T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2017-08-22T00:00:00-07:00" itemprop="datePublished">
           Aug 22, 2017
         </time>
         
@@ -175,23 +185,24 @@ It’s an effective method to reduce the computation complexity of deep neural n
 
 <p>In TVM, depthwise convolution can be declared as:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># padding stage
-</span><span class="n">PaddedInput</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># padding stage</span>
+<span class="n">PaddedInput</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
     <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">in_channel</span><span class="p">,</span> <span class="n">height_after_pad</span><span class="p">,</span> <span class="n">width_after_pad</span><span class="p">),</span>
-    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="p">.</span><span class="n">select</span><span class="p">(</span>
-        <span class="n">tvm</span><span class="p">.</span><span class="nb">all</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span> <span class="o">&lt;</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">j</span> <span class="o">&gt;=</span> <span class="n">pad_left</span><span class="p">,</span [...]
-        <span class="n">Input</span><span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">j</span> <span class="o">-</span> <span class="n">pad_left</span><span class="p">],</span> <span class="n">tvm</span><span class="p">.</span><span class="n">const</span><span class="p">(</span><span class="mf">0.0 [...]
+    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="n">select</span><span class="p">(</span>
+        <span class="n">tvm</span><span class="o">.</span><span class="nb">all</span><span class="p">(</span><span class="n">i</span> <span class="o">&gt;=</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span> <span class="o">&lt;</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">j</span> <span class="o">&gt;=</span> <span class="n">pad_left</span><span class="p">,</span [...]
+        <span class="n">Input</span><span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">j</span> <span class="o">-</span> <span class="n">pad_left</span><span class="p">],</span> <span class="n">tvm</span><span class="o">.</span><span class="n">const</span><span class="p">(</span><span class="mf">0.0 [...]
     <span class="n">name</span><span class="o">=</span><span class="s">"PaddedInput"</span><span class="p">)</span>
-<span class="c1"># depthconv stage
-</span><span class="n">di</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_height</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'di'</span><span class="p">)</span>
-<span class="n">dj</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_width</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'dj'</span><span class="p">)</span>
-<span class="n">Output</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span>
+<span class="c"># depthconv stage</span>
+<span class="n">di</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_height</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'di'</span><span class="p">)</span>
+<span class="n">dj</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">filter_width</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'dj'</span><span class="p">)</span>
+<span class="n">Output</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
     <span class="p">(</span><span class="n">batch</span><span class="p">,</span> <span class="n">out_channel</span><span class="p">,</span> <span class="n">out_height</span><span class="p">,</span> <span class="n">out_width</span><span class="p">),</span>
-    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span>
+    <span class="k">lambda</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span>
         <span class="n">PaddedInput</span><span class="p">[</span><span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="o">/</span><span class="n">channel_multiplier</span><span class="p">,</span> <span class="n">i</span><span class="o">*</span><span class="n">stride_h</span> <span class="o">+</span> <span class="n">di</span><span class="p">,</span> <span class="n">j</span><span class="o">*</span><span class="n">stride_w</span> <span class="o">+</span> <sp [...]
         <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">di</span><span class="p">,</span> <span class="n">dj</span><span class="p">]),</span>
     <span class="n">name</span><span class="o">=</span><span class="s">'DepthwiseConv2d'</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <h2 id="general-gpu-optimization-guidelines">General GPU Optimization Guidelines</h2>
 
@@ -240,22 +251,24 @@ To avoid bank conflicts, it’s better that successive threads access successive
 <h3 id="compute-paddedinput-inline-to-save-memory-allocation">Compute PaddedInput Inline to Save Memory Allocation</h3>
 <p>As we see from part 1, padding is declared explicitly as a separate stage. We compute it inline to avoid redundant memory allocation:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">PaddedInput</span><span class="p">].</span><span class="n">compute_inline</span><span class="p">()</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">PaddedInput</span><span class="p">]</span><span class="o">.</span><span class="n">compute_inline</span><span class="p">()</span>
+</code></pre>
+</div>
 
 <h3 id="divide-one-large-channel-into-smaller-blocks">Divide One Large Channel into Smaller Blocks</h3>
 <p>One straightforward schedule for depthwise convolution is that one cuda block takes care of one input channel and corresponding filters, loading them into shared memory and then computing:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">IS</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">PaddedInput</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
-<span class="n">FS</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">Filter</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
-<span class="n">block_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">)</span>
-<span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
-<span class="c1"># bind the dimension of batch (N in NCHW) with block_y
-</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">block_y</span><span class="p">)</span>
-<span class="c1"># bind the dimension of channel (C in NCHW) with block_x
-</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">block_x</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">IS</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">PaddedInput</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
+<span class="n">FS</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">cache_read</span><span class="p">(</span><span class="n">Filter</span><span class="p">,</span> <span class="s">"shared"</span><span class="p">,</span> <span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">])</span>
+<span class="n">block_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">)</span>
+<span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
+<span class="c"># bind the dimension of batch (N in NCHW) with block_y</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">block_y</span><span class="p">)</span>
+<span class="c"># bind the dimension of channel (C in NCHW) with block_x</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">block_x</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>We test the average time cost of 1000 runs on GTX 1080, and compare with <a href="https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/convolution#depthwise_conv2d">depthwise_conv2d in tensorflow</a>.
 Here is the result:</p>
@@ -308,18 +321,19 @@ One main reason is that too much shared memory allocated to one block limits the
 <p>We modify the schedule to divide one large channel into smaller blocks. For example, one channel (64 x 64 or 96 x 96) is divided into blocks of 32 x 32,
 and one cuda block takes care of one 32 x 32 block:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">blocking_h</span> <span class="o">=</span> <span class="mi">32</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">blocking_h</span> <span class="o">=</span> <span class="mi">32</span>
 <span class="n">blocking_w</span> <span class="o">=</span> <span class="mi">32</span>
-<span class="c1"># split the dimension of height (H in NCHW)
-</span><span class="n">bx1</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">factor</sp [...]
-<span class="c1"># split the dimension of width (W in NCHW)
-</span><span class="n">bx2</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="n">factor</sp [...]
-<span class="c1"># assign one 32 x 32 block to one cuda block
-</span><span class="n">by</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">fuse</span><span class="p">(</span><span class="n">Output</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">Output</span><span class="p">.</span><span class="n">op</span [...]
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">by</span><span class="p">,</span> <span class="n">block_y</span><span class="p">)</span>
-<span class="n">bx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">fuse</span><span class="p">(</span><span class="n">bx1</span><span class="p">,</span> <span class="n">bx2</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="c"># split the dimension of height (H in NCHW)</span>
+<span class="n">bx1</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span clas [...]
+<span class="c"># split the dimension of width (W in NCHW)</span>
+<span class="n">bx2</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span clas [...]
+<span class="c"># assign one 32 x 32 block to one cuda block</span>
+<span class="n">by</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">fuse</span><span class="p">(</span><span class="n">Output</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">Output</span><span class="o">.</span><span cl [...]
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">by</span><span class="p">,</span> <span class="n">block_y</span><span class="p">)</span>
+<span class="n">bx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">fuse</span><span class="p">(</span><span class="n">bx1</span><span class="p">,</span> <span class="n">bx2</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Here is the new result:</p>
 
@@ -354,18 +368,19 @@ and one cuda block takes care of one 32 x 32 block:</p>
 
 <p>How to schedule the workload, say, 32x32 among the threads of one cuda block? Intuitively, it should be like this:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_thread_y</span> <span class="o">=</span> <span class="mi">8</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">num_thread_y</span> <span class="o">=</span> <span class="mi">8</span>
 <span class="n">num_thread_x</span> <span class="o">=</span> <span class="mi">8</span>
-<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
-<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
-<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
-<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">reorder</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
-</code></pre></div></div>
-
-<p>There are two parameters in the schedule: <code class="language-plaintext highlighter-rouge">num_thread_y</code> and <code class="language-plaintext highlighter-rouge">num_thread_x</code>. How to determine the optimal combination of them? 
+<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
+<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
+<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
+<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p>There are two parameters in the schedule: <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code>. How to determine the optimal combination of them? 
 Well, let’s first do some experiments. Below is the result with Filter = [256, 1, 3, 3] and stride = [1, 1]:</p>
 
 <table>
@@ -421,7 +436,7 @@ It has better data reuse than case 1’s 4x1 tile.</p>
     <p>Case 3 is slower than case 2. It’s because in case 3, the workload per thread is too large and leads to much cost of local memory read.</p>
   </li>
   <li>
-    <p>Case 4 is slower than case 3. It’s because <code class="language-plaintext highlighter-rouge">num_thread_x = 32</code> ensures no bank conflicts, while <code class="language-plaintext highlighter-rouge">num_thread_y = 32</code> doesn’t.</p>
+    <p>Case 4 is slower than case 3. It’s because <code class="highlighter-rouge">num_thread_x = 32</code> ensures no bank conflicts, while <code class="highlighter-rouge">num_thread_y = 32</code> doesn’t.</p>
   </li>
 </ul>
 
@@ -429,14 +444,14 @@ It has better data reuse than case 1’s 4x1 tile.</p>
 
 <ul>
   <li>Large tile is good for data reuse, but not good for local memory read.</li>
-  <li>The influence of <code class="language-plaintext highlighter-rouge">num_thread_y</code> and <code class="language-plaintext highlighter-rouge">num_thread_x</code> on bank conflicts is asymmetric.</li>
-  <li>To find the optimal combination of <code class="language-plaintext highlighter-rouge">num_thread_y</code> and <code class="language-plaintext highlighter-rouge">num_thread_x</code> is to achieve a balance of efficient shared memory access (avoid bank conflicts), data reuse, and local memory read.</li>
+  <li>The influence of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> on bank conflicts is asymmetric.</li>
+  <li>To find the optimal combination of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> is to achieve a balance of efficient shared memory access (avoid bank conflicts), data reuse, and local memory read.</li>
 </ul>
 
 <p>Pretty tricky. So, what exactly should we do to find the optimal combination? The answer is brute force search. 
-We can pass <code class="language-plaintext highlighter-rouge">num_thread_y</code> and <code class="language-plaintext highlighter-rouge">num_thread_x</code> as arguments to the schedule function, and try all possible combinations to find the optimal one. This can be done easily in TVM:</p>
+We can pass <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> as arguments to the schedule function, and try all possible combinations to find the optimal one. This can be done easily in TVM:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">schedule_depthwise_conv2d</span><span class="p">(...,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">schedule_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="mi">8</span><span class="p">):</span>
     <span class="n">num_thread_y</span> <span class="o">=</span> <span class="n">num_thread_y</span>
     <span class="n">num_thread_x</span> <span class="o">=</span> <span class="n">num_thread_x</span>
     <span class="n">do_schedule_as_usual</span>
@@ -444,49 +459,51 @@ We can pass <code class="language-plaintext highlighter-rouge">num_thread_y</cod
 
 <span class="n">min_time_cost</span> <span class="o">=</span> <span class="n">inf</span>
 <span class="k">for</span> <span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span> <span class="ow">in</span> <span class="n">all_possible_combinations</span><span class="p">:</span>
-    <span class="n">schedule</span> <span class="o">=</span> <span class="n">schedule_depthwise_conv2d</span><span class="p">(...,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
-    <span class="n">time_cost</span> <span class="o">=</span> <span class="n">test_depthwise_conv2d</span><span class="p">(...,</span> <span class="n">schedule</span><span class="p">)</span>
+    <span class="n">schedule</span> <span class="o">=</span> <span class="n">schedule_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+    <span class="n">time_cost</span> <span class="o">=</span> <span class="n">test_depthwise_conv2d</span><span class="p">(</span><span class="o">...</span><span class="p">,</span> <span class="n">schedule</span><span class="p">)</span>
     <span class="k">if</span> <span class="n">time_cost</span> <span class="o">&lt;</span> <span class="n">min_time_cost</span><span class="p">:</span>
         <span class="n">min_time_cost</span> <span class="o">=</span> <span class="n">time_cost</span>
         <span class="n">optimal_combination</span> <span class="o">=</span> <span class="p">[</span><span class="n">num_thread_y</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">]</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>In fact, it can be seen as a simple auto scheduler.</p>
 
 <h3 id="vthread-and-strided-patterns">Vthread and Strided Patterns</h3>
 <p>Vthread (virtual thread) in TVM is introduced to support strided patterns. We can use it this way:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_vthread_y</span> <span class="o">=</span> <span class="mi">2</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">num_vthread_y</span> <span class="o">=</span> <span class="mi">2</span>
 <span class="n">num_vthread_x</span> <span class="o">=</span> <span class="mi">2</span>
 <span class="n">num_thread_y</span> <span class="o">=</span> <span class="mi">8</span>
 <span class="n">num_thread_x</span> <span class="o">=</span> <span class="mi">8</span>
-<span class="n">thread_vy</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_y</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vy"</span><span class="p">)</span>
-<span class="n">thread_vx</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_x</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vx"</span><span class="p">)</span>
-<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
-<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
-<span class="c1"># split the dimension of height (H in NCHW) twice
-</span><span class="n">tvy</span><span class="p">,</span> <span class="n">vyi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_y</span><span class="p">)</span>
-<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">vyi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
-<span class="c1"># split the dimension of width (W in NCHW) twice
-</span><span class="n">tvx</span><span class="p">,</span> <span class="n">vxi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_x</span><span class="p">)</span>
-<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">vxi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
-<span class="c1"># bind thread and vthread respectively
-</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">thread_vy</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">tvx</span><span class="p">,</span> <span class="n">thread_vx</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">].</span><span class="n">reorder</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">tvx</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="n">thread_vy</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_y</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vy"</span><span class="p">)</span>
+<span class="n">thread_vx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_vthread_x</span><span class="p">),</span> <span class="s">"vthread"</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"vx"</span><span class="p">)</span>
+<span class="n">thread_y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_y</span><span class="p">),</span> <span class="s">"threadIdx.y"</span><span class="p">)</span>
+<span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">num_thread_x</span><span class="p">),</span> <span class="s">"threadIdx.x"</span><span class="p">)</span>
+<span class="c"># split the dimension of height (H in NCHW) twice</span>
+<span class="n">tvy</span><span class="p">,</span> <span class="n">vyi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">h_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_y</span><span class="p">)</span>
+<span class="n">ty</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">vyi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_y</span><span class="p">)</span>
+<span class="c"># split the dimension of width (W in NCHW) twice</span>
+<span class="n">tvx</span><span class="p">,</span> <span class="n">vxi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">w_dim</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_vthread_x</span><span class="p">)</span>
+<span class="n">tx</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">vxi</span><span class="p">,</span> <span class="n">nparts</span><span class="o">=</span><span class="n">num_thread_x</span><span class="p">)</span>
+<span class="c"># bind thread and vthread respectively</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">thread_vy</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tvx</span><span class="p">,</span> <span class="n">thread_vx</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ty</span><span class="p">,</span> <span class="n">thread_y</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">tvy</span><span class="p">,</span> <span class="n">tvx</span><span class="p">,</span> <span class="n">ty</span><span class="p">,</span> <span class="n">tx</span><span class="p">,</span> <span class="n">yi</span><span class="p">,</span> <span class="n">xi</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Let’s print the IR to see what vthread does:</p>
 
-<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<div class="language-c++ highlighter-rouge"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
 <span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
-  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
-  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
-  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
-  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
-  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1
+</span>  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1
+</span>  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8
+</span>  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8
+</span>  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
     <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span>
       <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><spa [...]
       <span class="n">DepthwiseConv2d</span><span class="p">[(((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">16</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><sp [...]
@@ -503,17 +520,18 @@ We can pass <code class="language-plaintext highlighter-rouge">num_thread_y</cod
     <span class="p">}</span>
   <span class="p">}</span>
 <span class="p">}</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Without vthread (just set to 1), the IR is:</p>
 
-<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<div class="language-c++ highlighter-rouge"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
 <span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
-  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
-  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
-  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
-  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
-  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1
+</span>  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1
+</span>  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8
+</span>  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8
+</span>  <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
     <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">.</span><span class="n">inner</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
       <span class="n">DepthwiseConv2d</span><span class="p">[((((((((</span><span class="n">blockIdx</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span><span class="p">)</span><span class="o">*</span><span class="mi">8</span><span class="p">)</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">y</span><span class="p">)</span><span [...]
       <span class="k">for</span> <span class="p">(</span><span class="n">di</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="p">{</span>
@@ -524,9 +542,10 @@ We can pass <code class="language-plaintext highlighter-rouge">num_thread_y</cod
     <span class="p">}</span>
   <span class="p">}</span>
 <span class="p">}</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>As we can see, when <code class="language-plaintext highlighter-rouge">num_vthread_y = 2</code> and <code class="language-plaintext highlighter-rouge">num_vthread_x = 2</code>, the 32 x 32 channel is divided into four sub-channels of 16 x 16.
+<p>As we can see, when <code class="highlighter-rouge">num_vthread_y = 2</code> and <code class="highlighter-rouge">num_vthread_x = 2</code>, the 32 x 32 channel is divided into four sub-channels of 16 x 16.
 Each thread computes four output elements at a time, one element in one sub-channel.</p>
 
 <p>Below is the result with Filter = [256, 1, 3, 3], stride = [1, 1], blocking_h = 32, blocking_w = 32:</p>
@@ -582,7 +601,7 @@ table th:nth-of-type(2) {
   </tbody>
 </table>
 
-<p>Case 2 is faster than case 1. It’s because in case 2 <code class="language-plaintext highlighter-rouge">num_thread_x=8</code> and <code class="language-plaintext highlighter-rouge">num_vthread_x=4</code> together ensures that consecutive threads access consecutive memory addresses,
+<p>Case 2 is faster than case 1. It’s because in case 2 <code class="highlighter-rouge">num_thread_x=8</code> and <code class="highlighter-rouge">num_vthread_x=4</code> together ensures that consecutive threads access consecutive memory addresses,
 thus avoiding bank conflicts, as illustrated below (each color represents one thread’s workload):</p>
 
 <p style="text-align: center"><img src="/images/depthconv_tutorial/vthread_and_strided_pattern.png" alt="image" width="90%" /></p>
@@ -646,30 +665,31 @@ vthread saves additional 5us.</p>
 <p>One typical optimization we can do in deep learning is operator fusion, that computes multiple operators together in a single kernel without saving intermediate results back to global memory.
 TVM supports that out of the box.</p>
 
-<p>Consider a common pattern in neural networks: <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code> + <code class="language-plaintext highlighter-rouge">scale_shift</code> + <code class="language-plaintext highlighter-rouge">relu</code>. We can fuse the three operators into one, by slightly modifying the original schedule:</p>
+<p>Consider a common pattern in neural networks: <code class="highlighter-rouge">depthwise_conv2d</code> + <code class="highlighter-rouge">scale_shift</code> + <code class="highlighter-rouge">relu</code>. We can fuse the three operators into one, by slightly modifying the original schedule:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DepthwiseConv2d</span> <span class="o">=</span> <span class="n">topi</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">depthwise_conv2d</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">Filter</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span [...]
-<span class="n">ScaleShift</span> <span class="o">=</span> <span class="n">topi</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">scale_shift</span><span class="p">(</span><span class="n">DepthwiseConv2d</span><span class="p">,</span> <span class="n">Scale</span><span class="p">,</span> <span class="n">Shift</span><span class="p">)</span>
-<span class="n">Relu</span> <span class="o">=</span> <span class="n">topi</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">relu</span><span class="p">(</span><span class="n">ScaleShift</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">DepthwiseConv2d</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">depthwise_conv2d</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">Filter</span><span class="p">,</span> <span class="n">stride</span><span class="p">,</span> <span class="n">padding</spa [...]
+<span class="n">ScaleShift</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">scale_shift</span><span class="p">(</span><span class="n">DepthwiseConv2d</span><span class="p">,</span> <span class="n">Scale</span><span class="p">,</span> <span class="n">Shift</span><span class="p">)</span>
+<span class="n">Relu</span> <span class="o">=</span> <span class="n">topi</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">relu</span><span class="p">(</span><span class="n">ScaleShift</span><span class="p">)</span>
 
-<span class="n">Output</span> <span class="o">=</span> <span class="n">Relu</span> <span class="c1"># is no longer DepthwiseConv2d
-</span><span class="n">s</span><span class="p">[</span><span class="n">ScaleShift</span><span class="p">].</span><span class="n">compute_inline</span><span class="p">()</span> <span class="c1"># this line fuses ScaleShift, explicitly
-</span><span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">].</span><span class="n">set_scope</span><span class="p">(</span><span class="s">"local"</span><span class="p">)</span> <span class="c1"># this line fuses DepthwiseConv2d, implicitly
-</span><span class="n">schedule</span><span class="p">(</span><span class="n">Output</span><span class="p">)</span> <span class="c1"># schedule for Output the same way we schedule for DepthwiseConv2d as discussed above
-</span><span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">].</span><span class="n">compute_at</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">],</span> <span class="n">tx</span><span class="p">)</span> <span class="c1"># tx is the inner most axis, bound to threadIdx.x
-</span></code></pre></div></div>
+<span class="n">Output</span> <span class="o">=</span> <span class="n">Relu</span> <span class="c"># is no longer DepthwiseConv2d</span>
+<span class="n">s</span><span class="p">[</span><span class="n">ScaleShift</span><span class="p">]</span><span class="o">.</span><span class="n">compute_inline</span><span class="p">()</span> <span class="c"># this line fuses ScaleShift, explicitly</span>
+<span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">]</span><span class="o">.</span><span class="n">set_scope</span><span class="p">(</span><span class="s">"local"</span><span class="p">)</span> <span class="c"># this line fuses DepthwiseConv2d, implicitly</span>
+<span class="n">schedule</span><span class="p">(</span><span class="n">Output</span><span class="p">)</span> <span class="c"># schedule for Output the same way we schedule for DepthwiseConv2d as discussed above</span>
+<span class="n">s</span><span class="p">[</span><span class="n">DepthwiseConv2d</span><span class="p">]</span><span class="o">.</span><span class="n">compute_at</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="n">Output</span><span class="p">],</span> <span class="n">tx</span><span class="p">)</span> <span class="c"># tx is the inner most axis, bound to threadIdx.x</span>
+</code></pre>
+</div>
 
 <p>It generates IR like this:</p>
 
-<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
+<div class="language-c++ highlighter-rouge"><pre class="highlight"><code><span class="cm">/* Input = [1, 1, 32, 32], Filter = [1, 1, 3, 3], stride = [1, 1], padding = 'SAME' */</span>
 <span class="n">produce</span> <span class="n">Relu</span> <span class="p">{</span>
-  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1</span>
-  <span class="c1">// attr [DepthwiseConv2d] storage_scope = "local"</span>
-  <span class="n">allocate</span> <span class="n">DepthwiseConv2d</span><span class="p">[</span><span class="n">float32</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">*</span> <span class="mi">4</span><span class="p">]</span>
-  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1</span>
-  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8</span>
-  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8</span>
-  <span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
+  <span class="c1">// attr [iter_var(blockIdx.y, , blockIdx.y)] thread_extent = 1
+</span>  <span class="c1">// attr [DepthwiseConv2d] storage_scope = "local"
+</span>  <span class="n">allocate</span> <span class="n">DepthwiseConv2d</span><span class="p">[</span><span class="n">float32</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">*</span> <span class="mi">4</span><span class="p">]</span>
+  <span class="c1">// attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 1
+</span>  <span class="c1">// attr [iter_var(threadIdx.y, Range(min=0, extent=8), threadIdx.y)] thread_extent = 8
+</span>  <span class="c1">// attr [iter_var(threadIdx.x, Range(min=0, extent=8), threadIdx.x)] thread_extent = 8
+</span>  <span class="n">produce</span> <span class="n">DepthwiseConv2d</span> <span class="p">{</span>
     <span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
       <span class="k">for</span> <span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="p">{</span>
         <span class="n">DepthwiseConv2d</span><span class="p">[((</span><span class="n">i</span><span class="o">*</span><span class="mi">4</span><span class="p">)</span> <span class="o">+</span> <span class="n">j</span><span class="p">)]</span> <span class="o">=</span> <span class="mf">0.000000</span><span class="n">f</span>
@@ -687,16 +707,17 @@ TVM supports that out of the box.</p>
     <span class="p">}</span>
   <span class="p">}</span>
 <span class="p">}</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>As we can see, each thread computes <code class="language-plaintext highlighter-rouge">scale_shift</code> and <code class="language-plaintext highlighter-rouge">relu</code> before writing the result of <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code> to global memory. The fused operator is as fast as single <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code>.
+<p>As we can see, each thread computes <code class="highlighter-rouge">scale_shift</code> and <code class="highlighter-rouge">relu</code> before writing the result of <code class="highlighter-rouge">depthwise_conv2d</code> to global memory. The fused operator is as fast as single <code class="highlighter-rouge">depthwise_conv2d</code>.
 Below is the result with Input = [1, 256, 96, 96], Filter = [256, 1, 3, 3], stride = [1, 1], padding = ‘SAME’:</p>
 
 <ul>
-  <li>tf-1.2 <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code>: 251.6 us</li>
-  <li>tf-1.2 <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code> + <code class="language-plaintext highlighter-rouge">scale_shift</code> + <code class="language-plaintext highlighter-rouge">relu</code> (separate): 419.9 us</li>
-  <li>TVM <code class="language-plaintext highlighter-rouge">depthwise_conv2d</code>: 90.9 us</li>
-  <li>TVM <code class="language-plaintext highlighter-rouge">depthwise_conv2d + scale_shift + relu</code> (fused): 91.5 us</li>
+  <li>tf-1.2 <code class="highlighter-rouge">depthwise_conv2d</code>: 251.6 us</li>
+  <li>tf-1.2 <code class="highlighter-rouge">depthwise_conv2d</code> + <code class="highlighter-rouge">scale_shift</code> + <code class="highlighter-rouge">relu</code> (separate): 419.9 us</li>
+  <li>TVM <code class="highlighter-rouge">depthwise_conv2d</code>: 90.9 us</li>
+  <li>TVM <code class="highlighter-rouge">depthwise_conv2d + scale_shift + relu</code> (fused): 91.5 us</li>
 </ul>
 
 <p>The advantage of operator fusion is obvious.</p>
@@ -705,9 +726,9 @@ Below is the result with Input = [1, 256, 96, 96], Filter = [256, 1, 3, 3], stri
 
 <h2 id="show-me-the-code">Show me the code</h2>
 <ul>
-  <li>Declare: <a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py">https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py</a></li>
-  <li>Schedule: <a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py">https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
-  <li>Test: <a href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py">https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
+  <li>Declare: <a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py</a></li>
+  <li>Schedule: <a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
+  <li>Test: <a href="https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py">https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
 </ul>
 
 <h2 id="acknowledgements">Acknowledgements</h2>
@@ -729,41 +750,37 @@ He is experiencing a gap year after obtaining a bachelor’s degree in electrica
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
 
-</section>
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2017/10/06/nnvm-compiler-announcement.html b/2017/10/06/nnvm-compiler-announcement.html
index d3eb49f..792bc1b 100644
--- a/2017/10/06/nnvm-compiler-announcement.html
+++ b/2017/10/06/nnvm-compiler-announcement.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>NNVM Compiler: Open Compiler for AI Frameworks</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>NNVM Compiler: Open Compiler for AI Frameworks </h1>
       <p class="post-meta">
-        <time datetime="2017-10-06T11:30:00-04:00" itemprop="datePublished">
+        <time datetime="2017-10-06T08:30:00-07:00" itemprop="datePublished">
           Oct 6, 2017
         </time>
         
@@ -229,41 +239,37 @@ We also learns from Halide when implementing the lowering pipeline in TVM.</li>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
index 1b48741..cb4899a 100644
--- a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
+++ b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm </h1>
       <p class="post-meta">
-        <time datetime="2017-10-30T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2017-10-30T00:00:00-07:00" itemprop="datePublished">
           Oct 30, 2017
         </time>
         
@@ -177,7 +187,7 @@
 
 <ul>
   <li>Loads Resnet 50 model from <a href="https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html">the Gluon model zoo</a></li>
-  <li>Converts Gluon Resnet 50 model to NNVM graph format, using <code class="language-plaintext highlighter-rouge">nnvm.frontend.from_mxnet (...)</code></li>
+  <li>Converts Gluon Resnet 50 model to NNVM graph format, using <code class="highlighter-rouge">nnvm.frontend.from_mxnet (...)</code></li>
   <li>Compiles and executes the graph with ROCm backend</li>
 </ul>
 
@@ -194,7 +204,7 @@ TVM prediction top-1: 282 tiger cat</code></pre></figure>
 
 <p>The script <a href="https://github.com/ROCmSoftwarePlatform/nnvm-rocm/blob/master/advanced_superres_onnx.py">advanced_superres_onnx.py</a> gives an example of loading a model trained with PyTorch. The model is stored in the <a href="https://onnx.ai/">ONNX</a> format. In this example, our network takes an low resolution image as input, and outputs a 4x high resolution image. We refer the details of a problem setup and the network architecture to <a href="https://arxiv.org/abs/1609.0480 [...]
 
-<p>In order to use models in the ONNX format with NNVM, we first use <a href="https://github.com/onnx/onnx">the ONNX library</a> to load the ONNX model into the Protocol buffer object. We can then use <code class="language-plaintext highlighter-rouge">nnvm.frontend.from_onnx(...)</code> to obtain an equivalent NNVM graph. With a NNVM graph in hand, we can follow the generic workflow of compilation and graph execution outlined above.</p>
+<p>In order to use models in the ONNX format with NNVM, we first use <a href="https://github.com/onnx/onnx">the ONNX library</a> to load the ONNX model into the Protocol buffer object. We can then use <code class="highlighter-rouge">nnvm.frontend.from_onnx(...)</code> to obtain an equivalent NNVM graph. With a NNVM graph in hand, we can follow the generic workflow of compilation and graph execution outlined above.</p>
 
 <p style="text-align: center"><img src="/images/rocm/butterfly.png" alt="image" /></p>
 
@@ -204,7 +214,7 @@ TVM prediction top-1: 282 tiger cat</code></pre></figure>
 
 <h2 id="a-note-on-performance">A Note on performance</h2>
 
-<p>The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running <a href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py">the gemm test script</a> in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplication (not yet specifically o [...]
+<p>The current support on ROCm focuses on the functionality coverage. We have already seen promising performance results by simply adopting existing TVM schedules for CUDA backend. For example, you can try running <a href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py">the gemm test script</a> in the TVM repository and see the result. For two types of cards we tested, the current gemm recipe for square matrix multiplication (not yet specifically optimized f [...]
 This is already a promising start, as it is very hard to optimize performance to get to peak and we
 did not yet apply AMD GPU specific optimizations.
 We are starting to look at performance optimization and we expect more improvement to come.</p>
@@ -217,48 +227,48 @@ We are starting to look at performance optimization and we expect more improveme
 
 <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">absolute_import</span><span class="p">,</span> <span class="n">print_function</span>
 <span class="kn">import</span> <span class="nn">tvm</span>
-<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
 
-<span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">var</span><span class="p">(</span><span class="s">"n"</span><span class="p">)</span>
-<span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
-<span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
-<span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span><span class="n">A</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">B</span><span class= [...]
-<span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="p">.</span><span class="n">op</span><span class="p">)</span>
-<span class="n">bx</span><span class="p">,</span> <span class="n">tx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">factor</span><span class="o [...]
-<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
-<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span></code></pre></figure>
+<span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s">"n"</span><span class="p">)</span>
+<span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
+<span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
+<span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">:</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">B</span><span class= [...]
+<span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+<span class="n">bx</span><span class="p">,</span> <span class="n">tx</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">fact [...]
+<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bx</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
+<span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">tx</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span></code></pre></figure>
 
 <p>Next, to use ROCm backend we build our kernel under “rocm” target. This will cause TVM to use our new code generator. We also need a runtime context for ROCm backend.</p>
 
 <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">target</span> <span class="o">=</span> <span class="s">"rocm"</span>
-<span class="n">fadd_rocm</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">target</span><span class="p">,</span> <span class="n">target_host</span><span class="o">=</span [...]
-<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">rocm</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></code></pre></figure>
+<span class="n">fadd_rocm</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">target</span><span class="p">,</span> <span class="n">target_host</span><span class="o">=</span [...]
+<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">rocm</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span></code></pre></figure>
 
 <p>After building the kernel and setting up a runtime context, we can launch our vector add kernel.</p>
 
 <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">n</span> <span class="o">=</span> <span class="mi">1024</span>
-<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">).</span><span class="n">astype</span><span  [...]
-<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">).</span><span class="n">astype</span><span  [...]
-<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">C</span><span class="p">.</span><span class="n">dtype</span><span class=" [...]
+<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="o">.</span><span class= [...]
+<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">n</span><span class="p">)</span><span class="o">.</span><span class= [...]
+<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">C</span><span class="o">.</span><span class="n">dtype</span><span class=" [...]
 
 <span class="n">fadd_rocm</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span>
-<span class="n">np</span><span class="p">.</span><span class="n">testing</span><span class="p">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">a</span><span class="p">.</span><span class="n">asnumpy</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="p">.</span><span class="n">asnumpy</span><span class="p" [...]
+<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">a</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span> <span class="o">+</span> <span class="n">b</span><span class="o">.</span><span class="n">asnumpy</span><span class="p" [...]
 
 <p>We can view LLVM IR that TVM generates in the following way:</p>
 
-<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dev_module</span> <span class="o">=</span> <span class="n">fadd_rocm</span><span class="p">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
-<span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="p">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"llvm"</span><span class="p">))</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dev_module</span> <span class="o">=</span> <span class="n">fadd_rocm</span><span class="o">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
+<span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="o">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"llvm"</span><span class="p">))</span></code></pre></figure>
 
 <p>You should see something like this:</p>
 
 <figure class="highlight"><pre><code class="language-llvm" data-lang="llvm"><span class="c1">; ModuleID = 'myadd__kernel0'</span>
-<span class="k">source_filename</span> <span class="p">=</span> <span class="s">"myadd__kernel0"</span>
+<span class="err">sour</span><span class="k">c</span><span class="err">e_filename</span> <span class="p">=</span> <span class="s">"myadd__kernel0"</span>
 <span class="k">target</span> <span class="k">datalayout</span> <span class="p">=</span> <span class="s">"e-p:32:32-p1:64:64-p2:64:64-p3:32:32-p4:64:64-p5:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64"</span>
 <span class="k">target</span> <span class="k">triple</span> <span class="p">=</span> <span class="s">"amdgcn-amd-amdhsa-hcc"</span>
 
 
 <span class="c1">; Function Attrs: nounwind</span>
-<span class="k">define</span> <span class="k">dllexport</span> <span class="k">amdgpu_kernel</span> <span class="kt">void</span> <span class="vg">@myadd__kernel0</span><span class="p">(</span><span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="k">noalias</span> <span class="k">nocapture</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p [...]
+<span class="k">define</span> <span class="k">dllexport</span> <span class="err">amdgpu_ker</span><span class="k">ne</span><span class="err">l</span> <span class="kt">void</span> <span class="vg">@myadd__kernel0</span><span class="p">(</span><span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="k">noalias</span> <span clas [...]
 <span class="nl">entry:</span>
   <span class="nv">%4</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workgroup.id.x</span><span class="p">()</span>
   <span class="nv">%5</span> <span class="p">=</span> <span class="k">tail</span> <span class="k">call</span> <span class="kt">i32</span> <span class="vg">@llvm.amdgcn.workitem.id.x</span><span class="p">()</span>
@@ -278,14 +288,14 @@ We are starting to look at performance optimization and we expect more improveme
   <span class="nv">%10</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
   <span class="nv">%11</span> <span class="p">=</span> <span class="k">add</span> <span class="k">nsw</span> <span class="kt">i32</span> <span class="nv">%.pre-phi</span><span class="p">,</span> <span class="nv">%5</span>
   <span class="nv">%12</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%11</span> <span class="k">to</span> <span class="kt">i64</span>
-  <span class="nv">%13</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%2</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
-  <span class="nv">%14</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%13</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!2</span>
-  <span class="nv">%15</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%1</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
-  <span class="nv">%16</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%15</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!6</span>
+  <span class="nv">%13</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%2</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%14</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%13</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
+  <span class="nv">%15</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%1</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%12</span>
+  <span class="nv">%16</span> <span class="p">=</span> <span class="k">load</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%15</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv" [...]
   <span class="nv">%17</span> <span class="p">=</span> <span class="k">fadd</span> <span class="kt">float</span> <span class="nv">%14</span><span class="p">,</span> <span class="nv">%16</span>
   <span class="nv">%18</span> <span class="p">=</span> <span class="k">sext</span> <span class="kt">i32</span> <span class="nv">%10</span> <span class="k">to</span> <span class="kt">i64</span>
-  <span class="nv">%19</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%18</span>
-  <span class="k">store</span> <span class="kt">float</span> <span class="nv">%17</span><span class="p">,</span> <span class="kt">float</span> <span class="k">addrspace</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%19</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span class="nv">!9</span>
+  <span class="nv">%19</span> <span class="p">=</span> <span class="k">getelementptr</span> <span class="k">inbounds</span> <span class="kt">float</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%0</span><span class="p">,</span> <span class="kt">i64</span> <span class="nv">%18</span>
+  <span class="k">store</span> <span class="kt">float</span> <span class="nv">%17</span><span class="p">,</span> <span class="kt">float</span> <span class="k">add</span><span class="err">rspa</span><span class="k">c</span><span class="err">e</span><span class="p">(</span><span class="m">1</span><span class="p">)*</span> <span class="nv">%19</span><span class="p">,</span> <span class="k">align</span> <span class="m">4</span><span class="p">,</span> <span class="nv">!tbaa</span> <span clas [...]
   <span class="k">br</span> <span class="kt">label</span> <span class="nv">%if_end</span>
 
 
@@ -302,7 +312,7 @@ We are starting to look at performance optimization and we expect more improveme
 
 <p>We can also view GPU assembly that ROCm backend generates. This is the real code that runs on your GPU.</p>
 
-<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="p">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"asm"</span><span class="p">))</span></code></pre></figure>
+<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">print</span><span class="p">(</span><span class="n">dev_module</span><span class="o">.</span><span class="n">get_source</span><span class="p">(</span><span class="s">"asm"</span><span class="p">))</span></code></pre></figure>
 
 <p>The assembly should look something like this, omitting unnecessary details:</p>
 
@@ -372,41 +382,37 @@ BB0_6:
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2017/11/08/android-rpc-introduction.html b/2017/11/08/android-rpc-introduction.html
index 104829e..80b53fe 100644
--- a/2017/11/08/android-rpc-introduction.html
+++ b/2017/11/08/android-rpc-introduction.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Remote Profile and Test Deep Learning Cross Compilation on Mobile Phones with TVM RPC</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Remote Profile and Test Deep Learning Cross Compilation on Mobile Phones with TVM RPC </h1>
       <p class="post-meta">
-        <time datetime="2017-11-08T00:00:00-05:00" itemprop="datePublished">
+        <time datetime="2017-11-08T00:00:00-08:00" itemprop="datePublished">
           Nov 8, 2017
         </time>
         
@@ -175,70 +185,74 @@ In order to optimize a computation task, one has to edit the code on the develop
 
 <h2 id="run-tvm-app-on-android-phone">Run TVM APP on Android Phone</h2>
 
-<p>You can find Android RPC APP in <a href="https://github.com/dmlc/tvm/tree/master/apps/android_rpc">apps/android_rpc</a>. Please follow the instruction to build for your Android device. Once the APK is built, sign it using <code class="language-plaintext highlighter-rouge">apps/android_rpc/dev_tools</code> and install it on the phone. The APP looks like:</p>
+<p>You can find Android RPC APP in <a href="https://github.com/dmlc/tvm/tree/master/apps/android_rpc">apps/android_rpc</a>. Please follow the instruction to build for your Android device. Once the APK is built, sign it using <code class="highlighter-rouge">apps/android_rpc/dev_tools</code> and install it on the phone. The APP looks like:</p>
 
 <p style="text-align: center"><img src="/images/android_rpc/app.png" alt="image" width="25%" />
 <img src="/images/android_rpc/app_error.png" alt="image" width="25%" /></p>
 
 <p>Usually we cannot start a standalone server on mobile phone, instead we start an proxy server and use our app to connect.</p>
 
-<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python <span class="nt">-m</span> tvm.exec.rpc_proxy
-</code></pre></div></div>
+<div class="language-bash highlighter-rouge"><pre class="highlight"><code>python -m tvm.exec.rpc_proxy
+</code></pre>
+</div>
 
 <h2 id="create-ndarray-on-the-phone">Create NDArray on the Phone</h2>
 
 <p>Now we can connect to the proxy server from the laptop:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span>
-<span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span>
+<span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
+</code></pre>
+</div>
 
-<p>This will give us a handler <code class="language-plaintext highlighter-rouge">remote</code> which we can use to communicate with the mobile phone. For instance, the following lines create a 1024x1024 matrix on phone’s GPU:</p>
+<p>This will give us a handler <code class="highlighter-rouge">remote</code> which we can use to communicate with the mobile phone. For instance, the following lines create a 1024x1024 matrix on phone’s GPU:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span>
-	<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">1024</span><span class="p">)).</span><span class="n">astype</span><span class="p">(</span><span class="n">dtype</span><span class="p">),</span>
-	<span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="p">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span>
+	<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="mi">1024</span><span class="p">,</span> <span class="mi">1024</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">dtype</span><span class="p">),</span>
+	<span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
+</code></pre>
+</div>
 
-<p>When <code class="language-plaintext highlighter-rouge">A.asnumpy()</code> is called from the laptop, the matrix <code class="language-plaintext highlighter-rouge">A </code>will be copied to phone’s RAM and then transfer to the laptop through the proxy server. The TVM RPC interface is transparent to users.</p>
+<p>When <code class="highlighter-rouge">A.asnumpy()</code> is called from the laptop, the matrix <code class="highlighter-rouge">A </code>will be copied to phone’s RAM and then transfer to the laptop through the proxy server. The TVM RPC interface is transparent to users.</p>
 
 <h2 id="gemm-matrix-multiplication-on-the-phone">GEMM (Matrix Multiplication) on the Phone</h2>
 
 <p>Now we are going to introduce how to test matrix multiplication on an Android phone. First let’s define the very simple GEMM schedule:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tvm</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tvm</span>
 <span class="k">def</span> <span class="nf">gemm</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">bn</span><span class="p">):</span>
-    <span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
-    <span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
-    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
+    <span class="n">A</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
+    <span class="n">B</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'B'</span><span class="p">)</span>
+    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
 
-    <span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span>
+    <span class="n">C</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span>
         <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span>
-        <span class="k">lambda</span> <span class="n">ii</span><span class="p">,</span> <span class="n">jj</span><span class="p">:</span> <span class="n">tvm</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">ii</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">k</span><span cla [...]
+        <span class="k">lambda</span> <span class="n">ii</span><span class="p">,</span> <span class="n">jj</span><span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">ii</span><span class="p">,</span> <span class="n">k</span><span class="p">]</span> <span class="o">*</span> <span class="n">B</span><span class="p">[</span><span class="n">k</span><span cla [...]
         <span class="n">name</span><span class="o">=</span><span class="s">'C'</span><span class="p">)</span>
 
-    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="p">.</span><span class="n">op</span><span class="p">)</span>
+    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
 
-    <span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
-    <span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">)</span>
+    <span class="n">block_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">)</span>
+    <span class="n">thread_x</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">)</span>
 
-    <span class="n">bo</span><span class="p">,</span> <span class="n">bi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">factor</span><span clas [...]
-    <span class="n">to</span><span class="p">,</span> <span class="n">ti</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">factor</span><span clas [...]
-    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">bi</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">ti</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
+    <span class="n">bo</span><span class="p">,</span> <span class="n">bi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n"> [...]
+    <span class="n">to</span><span class="p">,</span> <span class="n">ti</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n"> [...]
+    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">bi</span><span class="p">,</span> <span class="n">block_x</span><span class="p">)</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">C</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">ti</span><span class="p">,</span> <span class="n">thread_x</span><span class="p">)</span>
 
-    <span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="p">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">simple_mode</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
+    <span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span> <span class="n">simple_mode</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
 
-    <span class="k">return</span> <span class="n">tvm</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span>
+    <span class="k">return</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">],</span>
     	<span class="s">"opencl"</span><span class="p">,</span>
     	<span class="n">target_host</span><span class="o">=</span><span class="s">"llvm -target=arm64-linux-android"</span><span class="p">,</span>
     	<span class="n">name</span><span class="o">=</span><span class="s">"gemm_gpu"</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>There’s nothing special except the last line. Here we set the target to ‘opencl’ since this is the computation language which our Mali GPU supports. Note that we set <code class="language-plaintext highlighter-rouge">target_host</code> to ‘<code class="language-plaintext highlighter-rouge">llvm -target=arm64-linux-android</code>’, it depends on what architecture your Android Phone is. We tested on Samsung Galaxy S6 Edge, which has a Mali-T760 GPU. Here is the CPU info for this phone,</p>
+<p>There’s nothing special except the last line. Here we set the target to ‘opencl’ since this is the computation language which our Mali GPU supports. Note that we set <code class="highlighter-rouge">target_host</code> to ‘<code class="highlighter-rouge">llvm -target=arm64-linux-android</code>’, it depends on what architecture your Android Phone is. We tested on Samsung Galaxy S6 Edge, which has a Mali-T760 GPU. Here is the CPU info for this phone,</p>
 
-<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>adb shell
-shell@zenltechn:/ <span class="nv">$ </span><span class="nb">cat</span> /proc/cpuinfo
+<div class="language-bash highlighter-rouge"><pre class="highlight"><code><span class="gp">$ </span>adb shell
+shell@zenltechn:/ <span class="nv">$ </span>cat /proc/cpuinfo
 Processor	: AArch64 Processor rev 2 <span class="o">(</span>aarch64<span class="o">)</span>
 processor	: 0
 processor	: 1
@@ -256,58 +270,64 @@ CPU part	: 0xd03
 CPU revision	: 2
 
 Hardware	: SAMSUNG Exynos7420
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Please refer to <a href="https://clang.llvm.org/docs/CrossCompilation.html#target-triple">target triple</a> to learn the compile options for LLVM.</p>
 
-<p>We use <code class="language-plaintext highlighter-rouge">tvm.contrib.ndk</code> to build the shared library for the Android system,</p>
+<p>We use <code class="highlighter-rouge">tvm.contrib.ndk</code> to build the shared library for the Android system,</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span><span class="p">,</span> <span class="n">util</span><span class="p">,</span> <span class="n">ndk</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">tvm.contrib</span> <span class="kn">import</span> <span class="n">rpc</span><span class="p">,</span> <span class="n">util</span><span class="p">,</span> <span class="n">ndk</span>
 <span class="n">N</span> <span class="o">=</span> <span class="mi">1024</span>
 <span class="n">f</span> <span class="o">=</span> <span class="n">gemm</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">bn</span> <span class="o">=</span> <span class="mi">256</span><span class="p">)</span>
-<span class="n">temp</span> <span class="o">=</span> <span class="n">util</span><span class="p">.</span><span class="n">tempdir</span><span class="p">()</span>
-<span class="n">path_dso</span> <span class="o">=</span> <span class="n">temp</span><span class="p">.</span><span class="n">relpath</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
-<span class="n">f</span><span class="p">.</span><span class="n">export_library</span><span class="p">(</span><span class="n">path_dso</span><span class="p">,</span> <span class="n">ndk</span><span class="p">.</span><span class="n">create_shared</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="n">temp</span> <span class="o">=</span> <span class="n">util</span><span class="o">.</span><span class="n">tempdir</span><span class="p">()</span>
+<span class="n">path_dso</span> <span class="o">=</span> <span class="n">temp</span><span class="o">.</span><span class="n">relpath</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
+<span class="n">f</span><span class="o">.</span><span class="n">export_library</span><span class="p">(</span><span class="n">path_dso</span><span class="p">,</span> <span class="n">ndk</span><span class="o">.</span><span class="n">create_shared</span><span class="p">)</span>
+</code></pre>
+</div>
 
-<p><code class="language-plaintext highlighter-rouge">ndk.create_shared</code> reads the environment variable <code class="language-plaintext highlighter-rouge">TVM_NDK_CC</code> to find the compiler &amp; linker for the Android device. We can easily use NDK to generate standalone toolchain for our device. For example, the following commands generate standalone compilers and linkers for ARM64 Android devices.</p>
+<p><code class="highlighter-rouge">ndk.create_shared</code> reads the environment variable <code class="highlighter-rouge">TVM_NDK_CC</code> to find the compiler &amp; linker for the Android device. We can easily use NDK to generate standalone toolchain for our device. For example, the following commands generate standalone compilers and linkers for ARM64 Android devices.</p>
 
-<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt/android-ndk/build/tools/
-./make-standalone-toolchain.sh <span class="nt">--platform</span><span class="o">=</span>android-24 <span class="nt">--use-llvm</span> <span class="nt">--arch</span><span class="o">=</span>arm64 <span class="nt">--install-dir</span><span class="o">=</span>/opt/android-toolchain-arm64
-</code></pre></div></div>
+<div class="language-bash highlighter-rouge"><pre class="highlight"><code><span class="nb">cd</span> /opt/android-ndk/build/tools/
+./make-standalone-toolchain.sh --platform<span class="o">=</span>android-24 --use-llvm --arch<span class="o">=</span>arm64 --install-dir<span class="o">=</span>/opt/android-toolchain-arm64
+</code></pre>
+</div>
 
 <p>If everything goes right, we’ve got a shared library ‘gemm_gpu.so’. Now let’s upload it to the mobile phone, make the phone load the module and get a remote handler,</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">remote</span> <span class="o">=</span> <span class="n">rpc</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s">"0.0.0.0"</span><span class="p">,</span> <span class="mi">9090</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="s">"android"</span><span class="p">)</span>
 
-<span class="n">remote</span><span class="p">.</span><span class="n">upload</span><span class="p">(</span><span class="n">path_dso</span><span class="p">)</span>
-<span class="n">f</span> <span class="o">=</span> <span class="n">remote</span><span class="p">.</span><span class="n">load_module</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="n">remote</span><span class="o">.</span><span class="n">upload</span><span class="p">(</span><span class="n">path_dso</span><span class="p">)</span>
+<span class="n">f</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">load_module</span><span class="p">(</span><span class="s">"gemm_gpu.so"</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Create the remote arrays and print the running time,</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="p">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">ctx</span> <span class="o">=</span> <span class="n">remote</span><span class="o">.</span><span class="n">cl</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
 
-<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
-<span class="n">a_np</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">)).</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span>
-<span class="n">b_np</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">)).</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span><span class="p">)</span>
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
+<span class="n">a_np</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span>< [...]
+<span class="n">b_np</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s">"float32"</span>< [...]
 
-<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">a_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
-<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">b_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
-<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"float32"</span><span  [...]
+<span class="n">a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">a_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
+<span class="n">b</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">b_np</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
+<span class="n">c</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">N</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"float32"</span><span  [...]
 
-<span class="n">time_f</span> <span class="o">=</span> <span class="n">f</span><span class="p">.</span><span class="n">time_evaluator</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">entry_name</span><span class="p">,</span> <span class="n">ctx</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
-<span class="n">cost</span> <span class="o">=</span> <span class="n">time_f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">).</span><span class="n">mean</span>
-<span class="k">print</span><span class="p">(</span><span class="s">'%g secs/op, %g GFLOPS'</span> <span class="o">%</span> <span class="p">(</span><span class="n">cost</span><span class="p">,</span> <span class="n">ngflops</span><span class="p">(</span><span class="n">N</span><span class="p">)</span> <span class="o">/</span> <span class="n">cost</span><span class="p">))</span>
-</code></pre></div></div>
+<span class="n">time_f</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">time_evaluator</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">entry_name</span><span class="p">,</span> <span class="n">ctx</span><span class="p">,</span> <span class="n">number</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
+<span class="n">cost</span> <span class="o">=</span> <span class="n">time_f</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span>
+<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="si">%</span><span class="s">g secs/op, </span><span class="si">%</span><span class="s">g GFLOPS'</span> <span class="o">%</span> <span class="p">(</span><span class="n">cost</span><span class="p">,</span> <span class="n">ngflops</span><span class="p">(</span><span class="n">N</span><span class="p">)</span> <span class="o">/</span> <span class="n">cost</span><span class="p">))</span>
+</code></pre>
+</div>
 
 <p>Now we can verify the results on PC,</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="n">testing</span><span class="p">.</span><span class="n">assert_almost_equal</span><span class="p">(</span>
-	<span class="n">c</span><span class="p">.</span><span class="n">asnumpy</span><span class="p">(),</span>
-	<span class="n">a_np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">b_np</span><span class="p">),</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_almost_equal</span><span class="p">(</span>
+	<span class="n">c</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span>
+	<span class="n">a_np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">b_np</span><span class="p">),</span>
 	<span class="n">decimal</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>In the case above, we develop and cross-compile to a binary file for our mobile phone. Through the proxy server, the binary is uploaded to the phone and run in its JVM. This approach makes it easy to develop and test different computation workloads on Android.</p>
 
@@ -315,24 +335,25 @@ Hardware	: SAMSUNG Exynos7420
 
 <p>The Android APP is built on top of the Java runtime, which provides minimum supports for TVM Function and NDArray. Here’s an example for registering function in tvm4j,</p>
 
-<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">Function</span> <span class="n">func</span> <span class="o">=</span> <span class="nc">Function</span><span class="o">.</span><span class="na">convertFunc</span><span class="o">(</span><span class="k">new</span> <span class="nc">Function</span><span class="o">.</span><span class="na">Callback</span><span class="o">()</span> <span class="o">{</span>
-      <span class="nd">@Override</span> <span class="kd">public</span> <span class="nc">Object</span> <span class="nf">invoke</span><span class="o">(</span><span class="nc">TVMValue</span><span class="o">...</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
-        <span class="nc">StringBuilder</span> <span class="n">res</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">StringBuilder</span><span class="o">();</span>
-        <span class="k">for</span> <span class="o">(</span><span class="nc">TVMValue</span> <span class="n">arg</span> <span class="o">:</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="n">Function</span> <span class="n">func</span> <span class="o">=</span> <span class="n">Function</span><span class="o">.</span><span class="na">convertFunc</span><span class="o">(</span><span class="k">new</span> <span class="n">Function</span><span class="o">.</span><span class="na">Callback</span><span class="o">()</span> <span class="o">{</span>
+      <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Object</span> <span class="nf">invoke</span><span class="o">(</span><span class="n">TVMValue</span><span class="o">...</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+        <span class="n">StringBuilder</span> <span class="n">res</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringBuilder</span><span class="o">();</span>
+        <span class="k">for</span> <span class="o">(</span><span class="n">TVMValue</span> <span class="n">arg</span> <span class="o">:</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
           <span class="n">res</span><span class="o">.</span><span class="na">append</span><span class="o">(</span><span class="n">arg</span><span class="o">.</span><span class="na">asString</span><span class="o">());</span>
         <span class="o">}</span>
         <span class="k">return</span> <span class="n">res</span><span class="o">.</span><span class="na">toString</span><span class="o">();</span>
       <span class="o">}</span>
     <span class="o">});</span>
-<span class="nc">TVMValue</span> <span class="n">res</span> <span class="o">=</span> <span class="n">func</span><span class="o">.</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"Hello"</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">" "</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"World!"</span><span class="o">).</span><span class="na">invoke</span [...]
+<span class="n">TVMValue</span> <span class="n">res</span> <span class="o">=</span> <span class="n">func</span><span class="o">.</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"Hello"</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">" "</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="s">"World!"</span><span class="o">).</span><span class="na">invoke</span> [...]
 <span class="n">assertEquals</span><span class="o">(</span><span class="s">"Hello World!"</span><span class="o">,</span> <span class="n">res</span><span class="o">.</span><span class="na">asString</span><span class="o">());</span>
 <span class="n">res</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
 <span class="n">func</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>As we have seen in the GEMM part, one can build shared library by Python and execute it by Java,</p>
 
-<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">ml.dmlc.tvm.Module</span><span class="o">;</span>
+<div class="language-java highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">ml.dmlc.tvm.Module</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">ml.dmlc.tvm.NDArray</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">ml.dmlc.tvm.TVMContext</span><span class="o">;</span>
 
@@ -340,32 +361,34 @@ Hardware	: SAMSUNG Exynos7420
 <span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kd">public</span> <span class="kd">class</span> <span class="nc">LoadAddFunc</span> <span class="o">{</span>
-  <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="nc">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
-    <span class="nc">String</span> <span class="n">loadingDir</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
-    <span class="nc">Module</span> <span class="n">fadd</span> <span class="o">=</span> <span class="nc">Module</span><span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="n">loadingDir</span> <span class="o">+</span> <span class="nc">File</span><span class="o">.</span><span class="na">separator</span> <span class="o">+</span> <span class="s">"add_cpu.so"</span><span class="o">);</span>
+  <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
+    <span class="n">String</span> <span class="n">loadingDir</span> <span class="o">=</span> <span class="n">args</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
+    <span class="n">Module</span> <span class="n">fadd</span> <span class="o">=</span> <span class="n">Module</span><span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="n">loadingDir</span> <span class="o">+</span> <span class="n">File</span><span class="o">.</span><span class="na">separator</span> <span class="o">+</span> <span class="s">"add_cpu.so"</span><span class="o">);</span>
 
-    <span class="nc">TVMContext</span> <span class="n">ctx</span> <span class="o">=</span> <span class="nc">TVMContext</span><span class="o">.</span><span class="na">cpu</span><span class="o">();</span>
+    <span class="n">TVMContext</span> <span class="n">ctx</span> <span class="o">=</span> <span class="n">TVMContext</span><span class="o">.</span><span class="na">cpu</span><span class="o">();</span>
 
     <span class="kt">long</span><span class="o">[]</span> <span class="n">shape</span> <span class="o">=</span> <span class="k">new</span> <span class="kt">long</span><span class="o">[]{</span><span class="mi">2</span><span class="o">};</span>
-    <span class="nc">NDArray</span> <span class="n">arr</span> <span class="o">=</span> <span class="nc">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
+    <span class="n">NDArray</span> <span class="n">arr</span> <span class="o">=</span> <span class="n">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
     <span class="n">arr</span><span class="o">.</span><span class="na">copyFrom</span><span class="o">(</span><span class="k">new</span> <span class="kt">float</span><span class="o">[]{</span><span class="mi">3</span><span class="n">f</span><span class="o">,</span> <span class="mi">4</span><span class="n">f</span><span class="o">});</span>
-    <span class="nc">NDArray</span> <span class="n">res</span> <span class="o">=</span> <span class="nc">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
+    <span class="n">NDArray</span> <span class="n">res</span> <span class="o">=</span> <span class="n">NDArray</span><span class="o">.</span><span class="na">empty</span><span class="o">(</span><span class="n">shape</span><span class="o">,</span> <span class="n">ctx</span><span class="o">);</span>
 
     <span class="n">fadd</span><span class="o">.</span><span class="na">entryFunc</span><span class="o">().</span><span class="na">pushArg</span><span class="o">(</span><span class="n">arr</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="n">arr</span><span class="o">).</span><span class="na">pushArg</span><span class="o">(</span><span class="n">res</span><span class="o">).</span><span class="na">invoke</span><span class="o">();</span>
-    <span class="nc">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="nc">Arrays</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span><span class="n">res</span><span class="o">.</span><span class="na">asFloatArray</span><span class="o">()));</span>
+    <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">toString</span><span class="o">(</span><span class="n">res</span><span class="o">.</span><span class="na">asFloatArray</span><span class="o">()));</span>
 
     <span class="n">arr</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
     <span class="n">res</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
     <span class="n">fadd</span><span class="o">.</span><span class="na">release</span><span class="o">();</span>
   <span class="o">}</span>
 <span class="o">}</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Once you have built TVM library following the <a href="http://docs.tvmlang.org/how_to/install.html">Installation Guide</a>, run</p>
 
-<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make jvmpkg
+<div class="language-bash highlighter-rouge"><pre class="highlight"><code>make jvmpkg
 make jvminstall
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>This will compile, package and install tvm4j in your local maven repository. Please refer to <a href="https://github.com/dmlc/tvm/tree/master/jvm">tvm4j</a> for more information.</p>
 
@@ -378,41 +401,37 @@ make jvminstall
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
 
-</section>
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/01/16/opt-mali-gpu.html b/2018/01/16/opt-mali-gpu.html
index 71d3d86..8e82ff8 100644
--- a/2018/01/16/opt-mali-gpu.html
+++ b/2018/01/16/opt-mali-gpu.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Optimizing Mobile Deep Learning on ARM GPU with TVM</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Optimizing Mobile Deep Learning on ARM GPU with TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-01-16T00:00:00-05:00" itemprop="datePublished">
+        <time datetime="2018-01-16T00:00:00-08:00" itemprop="datePublished">
           Jan 16, 2018
         </time>
         
@@ -218,7 +228,7 @@ not require explicit vectorization. But also notice that the newer
 Mali Bitfrost GPUs are based on quad-style vectorization and does not
 require explicit vectorization.</li>
   <li>All threads in Mali GPUs have individual program counters. It means
-the <code class="language-plaintext highlighter-rouge">warp size</code> is 1, so that branch divergence is not a major problem.</li>
+the <code class="highlighter-rouge">warp size</code> is 1, so that branch divergence is not a major problem.</li>
 </ul>
 
 <h1 id="optimization--convolution-as-example">Optimization : Convolution as Example</h1>
@@ -289,59 +299,61 @@ tiling so that we can access the memory sequentially, which reduces
 cache miss rate.</p>
 
 <p>We do tiling on the width dimension of the input image and CO dimension
-of the filter matrix.  This is described by <code class="language-plaintext highlighter-rouge">tvm.compute</code>.</p>
+of the filter matrix.  This is described by <code class="highlighter-rouge">tvm.compute</code>.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tiling factor
-</span><span class="n">VH</span> <span class="o">=</span> <span class="mi">1</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># set tiling factor</span>
+<span class="n">VH</span> <span class="o">=</span> <span class="mi">1</span>
 <span class="n">VW</span> <span class="o">=</span> <span class="n">VC</span> <span class="o">=</span> <span class="mi">4</span>
 
-<span class="c1"># get input shape
-</span> <span class="n">_</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">IH</span><span class="p">,</span> <span class="n">IW</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span>
-<span class="n">CO</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span> <span class="o">=</span> <span class="n">kernel</span><span class="p">.</span><span class="n">shape</span>
+<span class="c"># get input shape</span>
+ <span class="n">_</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">IH</span><span class="p">,</span> <span class="n">IW</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">shape</span>
+<span class="n">CO</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span> <span class="o">=</span> <span class="n">kernel</span><span class="o">.</span><span class="n">shape</span>
 <span class="n">TH</span> <span class="o">=</span> <span class="n">IH</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">H_PAD</span>
 <span class="n">TW</span> <span class="o">=</span> <span class="n">IW</span> <span class="o">+</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">W_PAD</span>
 
-<span class="c1"># calc output shape
-</span><span class="n">OH</span> <span class="o">=</span> <span class="p">(</span><span class="n">IH</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">H_PAD</span> <span class="o">-</span> <span class="n">KH</span><span class="p">)</span> <span class="o">//</span> <span class="n">H_STR</span> <span class="o">+</span> <span class="mi">1</span>
+<span class="c"># calc output shape</span>
+<span class="n">OH</span> <span class="o">=</span> <span class="p">(</span><span class="n">IH</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">H_PAD</span> <span class="o">-</span> <span class="n">KH</span><span class="p">)</span> <span class="o">//</span> <span class="n">H_STR</span> <span class="o">+</span> <span class="mi">1</span>
 <span class="n">OW</span> <span class="o">=</span> <span class="p">(</span><span class="n">IW</span> <span class="o">+</span> <span class="mi">2</span><span class="o">*</span><span class="n">W_PAD</span> <span class="o">-</span> <span class="n">KW</span><span class="p">)</span> <span class="o">//</span> <span class="n">W_STR</span> <span class="o">+</span> <span class="mi">1</span>
 
-<span class="c1"># data shape after packing
-</span><span class="n">dvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">TH</span> <span class="o">//</span> <span class="p">(</span><span class="n">VH</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="p">),</span> <span class="n">TW</span> <span class="o">//</span> <span class="p">(</span><span class="n">VW</span><span class="o">*</span><span class="n">W_STRIDE</span><span class="p">), [...]
+<span class="c"># data shape after packing</span>
+<span class="n">dvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">TH</span> <span class="o">//</span> <span class="p">(</span><span class="n">VH</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="p">),</span> <span class="n">TW</span> <span class="o">//</span> <span class="p">(</span><span class="n">VW</span><span class="o">*</span><span class="n">W_STRIDE</span><span class="p">),</span> [...]
 
-<span class="c1"># kernel shape after packing
-</span><span class="n">kvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">CO</span> <span class="o">//</span> <span class="n">VC</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span><span class="p">,</span> <span class="n">VC</span><span class="p">)</span>
+<span class="c"># kernel shape after packing</span>
+<span class="n">kvshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">CO</span> <span class="o">//</span> <span class="n">VC</span><span class="p">,</span> <span class="n">CI</span><span class="p">,</span> <span class="n">KH</span><span class="p">,</span> <span class="n">KW</span><span class="p">,</span> <span class="n">VC</span><span class="p">)</span>
 
 <span class="n">ovshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">CO</span> <span class="o">//</span> <span class="n">VC</span><span class="p">,</span> <span class="n">OH</span> <span class="o">//</span> <span class="n">VH</span><span class="p">,</span> <span class="n">OW</span> <span class="o">//</span> <span class="n">VW</span><span class="p">,</span> <span class="n">VH</span><span class="p">,</span> <span c [...]
 <span class="n">oshape</span> <span class="o">=</span> <span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">CO</span><span class="p">,</span> <span class="n">OH</span><span class="p">,</span> <span class="n">OW</span><span class="p">)</span>
 
-<span class="c1"># define packing
-</span><span class="n">data_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span><span class="n">dvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">, [...]
+<span class="c"># define packing</span>
+<span class="n">data_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">dvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> [...]
     <span class="n">data_pad</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">ci</span><span class="p">][</span><span class="n">h</span><span class="o">*</span><span class="n">VH</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="o">+</span><span class="n">vh</span><span class="p">][</span><span class="n">w</span><span class="o">*</span><span class="n">VW</span><span class="o">*</span><span class="n">W_STRIDE</span><spa [...]
 
-<span class="n">kernel_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span><span class="n">kvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">:</span>
+<span class="n">kernel_vec</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">kvshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">:</span>
     <span class="n">kernel</span><span class="p">[</span><span class="n">co</span><span class="o">*</span><span class="n">VC</span><span class="o">+</span><span class="n">vc</span><span class="p">][</span><span class="n">ci</span><span class="p">][</span><span class="n">kh</span><span class="p">][</span><span class="n">kw</span><span class="p">],</span> <span class="n">name</span><span class="o">=</span><span class="s">'kernel_vec'</span><span class="p">)</span>
 
-<span class="c1"># define convolution
-</span><span class="n">ci</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">CI</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ci'</span><span class="p">)</span>
-<span class="n">kh</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KH</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kh'</span><span class="p">)</span>
-<span class="n">kw</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KW</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kw'</span><span class="p">)</span>
+<span class="c"># define convolution</span>
+<span class="n">ci</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">CI</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ci'</span><span class="p">)</span>
+<span class="n">kh</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KH</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kh'</span><span class="p">)</span>
+<span class="n">kw</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">KW</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'kw'</span><span class="p">)</span>
 
-<span class="n">conv</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span><span class="n">ovshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <sp [...]
-    <span class="n">tvm</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">data_vec</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="o">+</span><span class="n">kh</span><span  [...]
-            <span class="n">kernel_vec</span><span class="p">[</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="n">out_dtype</span><span class="p">),</span>
+<span class="n">conv</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">ovshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <sp [...]
+    <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">data_vec</span><span class="p">[</span><span class="n">n</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="o">*</span><span class="n">H_STRIDE</span><span class="o">+</span><span class="n">kh</span><span  [...]
+            <span class="n">kernel_vec</span><span class="p">[</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span><span class="p">]</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">out_dtype</span><span class="p">),</span>
             <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">]),</span> <span class="n">name</span><span class="o">=</span><span class="s">'conv'</span><span class="p">)</span>
 
-<span class="c1"># unpack to correct layout
-</span><span class="n">output</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">(</span><span class="n">oshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">:</span>
+<span class="c"># unpack to correct layout</span>
+<span class="n">output</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">(</span><span class="n">oshape</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">:</span>
                      <span class="n">conv</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">co</span><span class="o">//</span><span class="n">VC</span><span class="p">][</span><span class="n">h</span><span class="o">/</span><span class="n">VH</span><span class="p">][</span><span class="n">w</span><span class="o">//</span><span class="n">VW</span><span class="p">][</span><span class="n">h</span><span class="o">%</span><span class="n">VH</span>< [...]
                      <span class="n">name</span><span class="o">=</span><span class="s">'output_unpack'</span><span class="p">,</span> <span class="n">tag</span><span class="o">=</span><span class="s">'direct_conv_output'</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>We can inspect the defined IR by</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="p">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">data</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">output</span><span class="p">],</span> <span [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">data</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">output</span><span class="p">],</span> <span class="n">simple_mode< [...]
+</code></pre>
+</div>
 <p>I pick the convolution part here.</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>produce conv {
+<div class="highlighter-rouge"><pre class="highlight"><code>produce conv {
   for (co, 0, 64) {
     for (h, 0, 56) {
       for (w, 0, 14) {
@@ -365,7 +377,8 @@ of the filter matrix.  This is described by <code class="language-plaintext high
     }
   }
 }
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <h3 id="kernel-1-bind-thread">Kernel 1: bind thread</h3>
 <p>In TVM, we declare the computation at first and then <em>schedule</em> it.
@@ -375,42 +388,43 @@ is from <a href="http://halide-lang.org/">Halide</a>).</p>
 <p>The following schedule simply binds axes to GPU threads, so that
 our code can run on Mali GPU.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># helper function for binding thread
-</span><span class="k">def</span> <span class="nf">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">tensor</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">z_factor</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">y_factor</span><span cla [...]
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># helper function for binding thread</span>
+<span class="k">def</span> <span class="nf">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">tensor</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">z_factor</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">y_factor</span><span class="o"> [...]
     <span class="s">""" tile and bind 3d """</span>
     <span class="n">y_factor</span> <span class="o">=</span> <span class="n">y_factor</span> <span class="ow">or</span> <span class="n">z_factor</span>
     <span class="n">x_factor</span> <span class="o">=</span> <span class="n">x_factor</span> <span class="ow">or</span> <span class="n">y_factor</span>
-    <span class="n">zo</span><span class="p">,</span> <span class="n">zi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">z_factor</span><span class="p">)</span>
-    <span class="n">yo</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y_factor</span><span class="p">)</span>
-    <span class="n">xo</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">split</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x_factor</span><span class="p">)</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">zo</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.z"</span><span class="p">))</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">zi</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.z"</span><span class="p">))</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">yo</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">))</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">yi</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.y"</span><span class="p">))</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">xo</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
-    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">].</span><span class="n">bind</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span>
-
-<span class="c1"># set tunable parameter
-</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
-
-<span class="c1"># schedule data packing
-</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+    <span class="n">zo</span><span class="p">,</span> <span class="n">zi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">z_factor</span><span class="p">)</span>
+    <span class="n">yo</span><span class="p">,</span> <span class="n">yi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y_factor</span><span class="p">)</span>
+    <span class="n">xo</span><span class="p">,</span> <span class="n">xi</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x_factor</span><span class="p">)</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">zo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.z"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">zi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.z"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">yo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.y"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">yi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.y"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">xo</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"blockIdx.x"</span><span class="p">))</span>
+    <span class="n">s</span><span class="p">[</span><span class="n">tensor</span><span class="p">]</span><span class="o">.</span><span class="n">bind</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">thread_axis</span><span class="p">(</span><span class="s">"threadIdx.x"</span><span class="p">))</span>
+
+<span class="c"># set tunable parameter</span>
+<span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+
+<span class="c"># schedule data packing</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">ax [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="c1"># schedule kernel packing
-</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="c"># schedule kernel packing</span>
+<span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="c1"># schedule conv
-</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class=" [...]
-<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">reduce_axis</span>
+<span class="c"># schedule conv</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</sp [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
 
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">, [...]
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>With this schedule, our code can run now, but the performance is terrible.</p>
 
@@ -442,43 +456,44 @@ our code can run on Mali GPU.</p>
 <h3 id="kernel-2-unrolling">Kernel 2: unrolling</h3>
 <p>Loop unrolling can reduce the instructions for loop control, reduce
 branch penalties and hide latency in reading memory.
-In TVM, this can be done easily by calling <code class="language-plaintext highlighter-rouge">s.unroll(axis)</code></p>
+In TVM, this can be done easily by calling <code class="highlighter-rouge">s.unroll(axis)</code></p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tunable parameter
-</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># set tunable parameter</span>
+<span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
 
-<span class="c1"># schedule data packing
-</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="c"># schedule data packing</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">ax [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
 <span class="s">"""!! ADD UNROLL HERE !!"""</span>
-<span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
 
-<span class="c1"># schedule kernel packing
-</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="c"># schedule kernel packing</span>
+<span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
 <span class="s">"""!! ADD UNROLL HERE !!"""</span>
-<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
 
-<span class="c1"># schedule conv
-</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class=" [...]
-<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">reduce_axis</span>
+<span class="c"># schedule conv</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</sp [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
 
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">, [...]
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
 <span class="s">"""!! ADD UNROLL HERE !!"""</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
 
-<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <table>
   <thead>
@@ -515,43 +530,44 @@ In TVM, this can be done easily by calling <code class="language-plaintext highl
 <p>As mentioned before, we need to do vectorization explictly
  in order to achieve the best performance on Mali GPU.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># set tunable parameter
-</span><span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="c"># set tunable parameter</span>
+<span class="n">num_thread</span> <span class="o">=</span> <span class="mi">8</span>
 
-<span class="c1"># schedule data packing
-</span><span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="c"># schedule data packing</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">ax [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">data_vec</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="c1"># unroll
-</span><span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="c"># unroll</span>
+<span class="n">s</span><span class="p">[</span><span class="n">data_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
 
-<span class="c1"># schedule kernel packing
-</span><span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="c"># schedule kernel packing</span>
+<span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">kernel_vec</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">ci</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="c1"># unroll
-</span><span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="c"># unroll</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
 <span class="s">"""!! VECTORIZE HERE !!"""</span>
-<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">].</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">kernel_vec</span><span class="p">]</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
 
-<span class="c1"># schedule conv
-</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class=" [...]
-<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">reduce_axis</span>
+<span class="c"># schedule conv</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">vw</span><span class="p">,</span> <span class="n">vc</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</sp [...]
+<span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">,</span> <span class="n">kw</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">reduce_axis</span>
 
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh</span><span class="p">, [...]
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">reorder</span><span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">vh</span><span class="p">,</span> <span class="n">kc</span><span class="p">,</span> <span class="n">kh< [...]
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">conv</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
 
-<span class="c1"># unroll
-</span><span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
+<span class="c"># unroll</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kh</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">kw</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">unroll</span><span class="p">(</span><span class="n">vw</span><span class="p">)</span>
 <span class="s">"""!! VECTORIZE HERE !!"""</span>
-<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">].</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
+<span class="n">s</span><span class="p">[</span><span class="n">conv</span><span class="p">]</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="n">vc</span><span class="p">)</span>
 
-<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">].</span><span class="n">op</span><span class="p">.</span><span class="n">axis</span>
+<span class="n">_</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span> <span class="o">=</span> <span class="n">s</span><span class="p">[</span><span class="n">output</span><span class="p">]</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">axis</span>
 <span class="n">tile_and_bind3d</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">output</span><span class="p">,</span> <span class="n">co</span><span class="p">,</span> <span class="n">oh</span><span class="p">,</span> <span class="n">ow</span><span class="p">,</span> <span class="n">num_thread</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <table>
   <thead>
@@ -592,7 +608,7 @@ In TVM, this can be done easily by calling <code class="language-plaintext highl
 
 <h3 id="how-to-set-the-tunable-parameter">How to set the tunable parameter</h3>
 <p>As for the tunable parameters above, some can be calculated.
-For the vectorized dimension <code class="language-plaintext highlighter-rouge">VC</code>, we should fill the 128-bit register,
+For the vectorized dimension <code class="highlighter-rouge">VC</code>, we should fill the 128-bit register,
 so it can be set as 128/32=4 for float32 and 128/16=8 for float16.</p>
 
 <p>But more often we cannot determine the optimal value, due to the
@@ -602,8 +618,9 @@ IR rather than direct OpenCL code.</p>
 
 <h3 id="the-generated-opencl-code">The generated OpenCL code</h3>
 <p>We can view the generated OpenCL code by</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">func</span><span class="p">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get_source</span><span class="p">())</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">func</span><span class="o">.</span><span class="n">imported_modules</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">get_source</span><span class="p">())</span>
+</code></pre>
+</div>
 <p>The OpenCL code is too long to be pasted here, and it is hard to read due
 to heavy unrolling. If interested, you can view it
 <a href="https://github.com/merrymercy/tvm-mali/blob/master/data/kernels.cl">here</a>.</p>
@@ -613,14 +630,15 @@ to heavy unrolling. If interested, you can view it
 different backends on some popular deep neural networks.
 Our test environment is</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Firefly-RK3399 4G
+<div class="highlighter-rouge"><pre class="highlight"><code>Firefly-RK3399 4G
 CPU: dual-core Cortex-A72 + quad-core Cortex-A53
 GPU: Mali-T860MP4
 
 Arm Compute Library : v17.12
 MXNet: v1.0.1
 Openblas: v0.2.18
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>We use NNVM and TVM to do end-to-end compilation.</p>
 
@@ -724,41 +742,37 @@ advice and <a href="https://github.com/yzhliu">Yizhi Liu</a> for his earlier wor
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
 
-</section>
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/03/12/webgl.html b/2018/03/12/webgl.html
index db05f52..988707c 100644
--- a/2018/03/12/webgl.html
+++ b/2018/03/12/webgl.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Compiling Deep Learning Models to WebGL with TVM</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Compiling Deep Learning Models to WebGL with TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-03-12T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-03-12T00:00:00-07:00" itemprop="datePublished">
           Mar 12, 2018
         </time>
         
@@ -266,41 +276,37 @@ optimizations into the TVM stack.</p>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/03/23/nmt-transformer-optimize.html b/2018/03/23/nmt-transformer-optimize.html
index 9ec078f..211df5e 100644
--- a/2018/03/23/nmt-transformer-optimize.html
+++ b/2018/03/23/nmt-transformer-optimize.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Bringing TVM into TensorFlow for Optimizing Neural Machine Translation on GPU</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Bringing TVM into TensorFlow for Optimizing Neural Machine Translation on GPU </h1>
       <p class="post-meta">
-        <time datetime="2018-03-23T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-03-23T00:00:00-07:00" itemprop="datePublished">
           Mar 23, 2018
         </time>
         
@@ -179,12 +189,13 @@ One paricular challenge we observed, is that batch matmul is a major performance
 
 <p>Batch matmul computation can be described more concretely as follows:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>void BatchedGemm(input A, input B, output C, M, N, K, batch_dimension) {
+<div class="highlighter-rouge"><pre class="highlight"><code>void BatchedGemm(input A, input B, output C, M, N, K, batch_dimension) {
   for (int i = 0; i &lt; batch_dimension; ++i)  {
     DoGemm(A[i],B[i],C[i],M,K,N)
   }
 }
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <h4 id="batch-matmul-shapes">Batch matmul shapes</h4>
 
@@ -248,14 +259,15 @@ One paricular challenge we observed, is that batch matmul is a major performance
 
 <p>In TVM, a general batch matmul computation can be declared as:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+<div class="highlighter-rouge"><pre class="highlight"><code># computation representation
 A = tvm.placeholder((batch, M, K), name='A')
 B = tvm.placeholder((batch, K, N), name='B')
 k = tvm.reduce_axis((0, K), 'k')
 C = tvm.compute((batch, M, N),
          lambda b, y, x: tvm.sum(A[b, y, k] * B[b, k, x], axis = k),
          name = 'C')
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <h2 id="schedule-optimization">Schedule optimization</h2>
 
@@ -263,7 +275,7 @@ C = tvm.compute((batch, M, N),
 
 <h3 id="tuning-parameters-of-blockthread-numbers">Tuning parameters of block/thread numbers</h3>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # thread indices
+<div class="highlighter-rouge"><pre class="highlight"><code>  # thread indices
   block_y = tvm.thread_axis("blockIdx.y")
   block_x = tvm.thread_axis("blockIdx.x")
   thread_y = tvm.thread_axis((0, num_thread_y), "threadIdx.y")
@@ -287,10 +299,11 @@ C = tvm.compute((batch, M, N),
   s[C].bind(tx, thread_x)
   s[C].bind(vty, thread_yz)
   s[C].bind(vtx, thread_xz)
-</code></pre></div></div>
-<p>We fuse the outer dimensions of the batch matmul, i.e. the BB and FF of the op’s dimension, normally known as “batch” dimension in batch matmul computation. Then we split the outer and the inner dimensions by a factor of (<code class="language-plaintext highlighter-rouge">number_thread * vthread</code>).</p>
+</code></pre>
+</div>
+<p>We fuse the outer dimensions of the batch matmul, i.e. the BB and FF of the op’s dimension, normally known as “batch” dimension in batch matmul computation. Then we split the outer and the inner dimensions by a factor of (<code class="highlighter-rouge">number_thread * vthread</code>).</p>
 
-<p>Strided pattern is not needed in batch matmul, thus the virtual thread number (<code class="language-plaintext highlighter-rouge">vthread_y</code> and <code class="language-plaintext highlighter-rouge">vthread_x</code>) are both set to 1.</p>
+<p>Strided pattern is not needed in batch matmul, thus the virtual thread number (<code class="highlighter-rouge">vthread_y</code> and <code class="highlighter-rouge">vthread_x</code>) are both set to 1.</p>
 
 <h4 id="finding-the-best-combination-of-number_thread">Finding the best combination of number_thread</h4>
 
@@ -339,7 +352,7 @@ C = tvm.compute((batch, M, N),
   </tbody>
 </table>
 
-<p>As learned from <a href="http://tvmlang.org/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html">past experience</a>, the method to find the best combination of <code class="language-plaintext highlighter-rouge">num_thread_y</code> and <code class="language-plaintext highlighter-rouge">num_thread_x</code> is through brute-force search. After a brute-force search, the best combination for current shape can be found, which in current computation [...]
+<p>As learned from <a href="http://tvmlang.org/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html">past experience</a>, the method to find the best combination of <code class="highlighter-rouge">num_thread_y</code> and <code class="highlighter-rouge">num_thread_x</code> is through brute-force search. After a brute-force search, the best combination for current shape can be found, which in current computation is <code class="highlighter-rouge">nu [...]
 
 <h2 id="fuse-batch-matmul-with-other-operations">Fuse batch matmul with other operations</h2>
 
@@ -349,7 +362,7 @@ C = tvm.compute((batch, M, N),
 
 <p>Batch matmul and broadcast add fusion computation can be declared as follows:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+<div class="highlighter-rouge"><pre class="highlight"><code># computation representation
 A = tvm.placeholder((batch_size, features, M, K), name='A')
 # the shape of B is (N, K) other than (K, N) is because B is transposed is this fusion pattern
 B = tvm.placeholder((batch_size, features, N, K), name='B')
@@ -360,11 +373,12 @@ C = tvm.compute(
            lambda yb, yf, m, x: tvm.sum(A[yb, yf, m, k] * B[yb, yf, x, k], axis = k),
            name = 'C')
 D = topi.broadcast_add(C, ENTER)
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Batch matmul and transpose fusion computation can be declared as:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># computation representation
+<div class="highlighter-rouge"><pre class="highlight"><code># computation representation
 A = tvm.placeholder((batch_size, features, M, K), name='A')
 B = tvm.placeholder((batch_size, features, K, N), name='B')
 k = tvm.reduce_axis((0, K), 'k')
@@ -372,16 +386,17 @@ C = tvm.compute(
            (batch_size, M, features, N),
            lambda yb, m, yf, x: tvm.sum(A[yb, yf, m, k] * B[yb, yf, k, x], axis = k),
            name = 'C')
-</code></pre></div></div>
+</code></pre>
+</div>
 <h3 id="fusion-kernel-performance">Fusion Kernel Performance</h3>
 
 <p>The shape of [batch=64, heads=8, M=1, N=17, K=128] is chosen to elaborate the performance of the generated code. 17 is chosen as the sequence length since it is the average input length in our production scenarios.</p>
 
 <ul>
-  <li>tf-r1.4 <code class="language-plaintext highlighter-rouge">BatchMatmul</code>: 513.9 us</li>
-  <li>tf-r1.4 <code class="language-plaintext highlighter-rouge">BatchMatmul</code> + <code class="language-plaintext highlighter-rouge">Transpose</code> (separate): 541.9 us</li>
-  <li>TVM <code class="language-plaintext highlighter-rouge">BatchMatmul</code>: 37.62 us</li>
-  <li>TVM <code class="language-plaintext highlighter-rouge">BatchMatmul</code> + <code class="language-plaintext highlighter-rouge">Transpose</code> (fused): 38.39 us</li>
+  <li>tf-r1.4 <code class="highlighter-rouge">BatchMatmul</code>: 513.9 us</li>
+  <li>tf-r1.4 <code class="highlighter-rouge">BatchMatmul</code> + <code class="highlighter-rouge">Transpose</code> (separate): 541.9 us</li>
+  <li>TVM <code class="highlighter-rouge">BatchMatmul</code>: 37.62 us</li>
+  <li>TVM <code class="highlighter-rouge">BatchMatmul</code> + <code class="highlighter-rouge">Transpose</code> (fused): 38.39 us</li>
 </ul>
 
 <p>The kernel fusion optimization brings a further <b><em>1.7X</em></b> speed-up.</p>
@@ -412,41 +427,37 @@ C = tvm.compute(
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/07/12/vta-release-announcement.html b/2018/07/12/vta-release-announcement.html
index 7155faa..08c2b6e 100644
--- a/2018/07/12/vta-release-announcement.html
+++ b/2018/07/12/vta-release-announcement.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>VTA: An Open, Customizable Deep Learning Acceleration Stack </title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>VTA: An Open, Customizable Deep Learning Acceleration Stack  </h1>
       <p class="post-meta">
-        <time datetime="2018-07-12T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-07-12T00:00:00-07:00" itemprop="datePublished">
           Jul 12, 2018
         </time>
         
@@ -158,7 +168,7 @@
 
 <p>VTA is more than a standalone accelerator design: it’s an end-to-end solution that includes drivers, a JIT runtime, and an optimizing compiler stack based on TVM. The current release includes a behavioral hardware simulator, as well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast prototyping. By extending the TVM stack with a customizable, and open source deep learning hardware accelerator design, we are exposing a transparent end-to-end deep learning stack from [...]
 
-<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png" alt="image" width="50%" /></p>
+<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png" alt="image" width="50%" /></p>
 
 <p>The VTA and TVM stack together constitute a blueprint for end-to-end, accelerator-centric deep learning system that can:</p>
 
@@ -213,7 +223,7 @@ The extendability of the compiler stack, combined with the ability to modify the
 <p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning accelerator built around a GEMM core, which performs dense matrix multiplication at a high computational throughput.
 The design is inspired by mainstream deep learning accelerators, of the likes of Google’s TPU accelerator. The design adopts decoupled access-execute to hide memory access latency and maximize utilization of compute resources. To a broader extent, VTA can serve as a template deep learning accelerator design, exposing a clean tensor computation abstraction to the compiler stack.</p>
 
-<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png" alt="image" width="60%" /></p>
+<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png" alt="image" width="60%" /></p>
 
 <p>The figure above presents a high-level overview of the VTA hardware organization. VTA is composed of four modules that communicate between each other via FIFO queues and single-writer/single-reader SRAM memory blocks, to allow for task-level pipeline parallelism.
 The compute module performs both dense linear algebra computation with its GEMM core, and general computation with its tensor ALU.
@@ -230,7 +240,7 @@ The first approach, which doesn’t require special hardware is to run deep lear
 This simulator back-end is readily available for developers to experiment with.
 The second approach relies on an off-the-shelf and low-cost FPGA development board – the <a href="http://www.pynq.io/">Pynq board</a>, which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
 
-<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png" alt="image" width="70%" /></p>
+<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png" alt="image" width="70%" /></p>
 
 <p>The VTA release offers a simple compilation and deployment flow of the VTA hardware design and TVM workloads on the Pynq platform, with the help of an RPC server interface.
 The RPC server handles FPGA reconfiguration tasks and TVM module invocation offloading onto the VTA runtime.
@@ -253,7 +263,7 @@ While this platform is meant for prototyping (the 2012 FPGA cannot compete with
 <p>A popular method used to assess the efficient use of hardware are roofline diagrams: given a hardware design, how efficiently are different workloads utilizing the hardware compute and memory resources. The roofline plot below shows the throughput achieved on different convolution layers of the ResNet-18 inference benchmark. Each layer has a different arithmetic intensity, i.e. compute to data movement ratio.
 In the left half, convolution layers are bandwidth limited, whereas on the right half, they are compute limited.</p>
 
-<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png" alt="image" width="60%" /></p>
+<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png" alt="image" width="60%" /></p>
 
 <p>The goal behind designing a hardware architecture, and a compiler stack is to bring each workload as close as possible to the roofline of the target hardware.
 The roofline plot shows the effects of having the hardware and compiler work together to maximize utilization of the available hardware resources.
@@ -262,7 +272,7 @@ The result is an overall higher utilization of the available compute and memory
 
 <h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18 evaluation</h3>
 
-<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png" alt="image" width="60%" /></p>
+<p style="text-align: center"><img src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png" alt="image" width="60%" /></p>
 
 <p>A benefit of having a complete compiler stack built for VTA is the ability to run end-to-end workloads. This is compelling in the context of hardware acceleration because we need to understand what performance bottlenecks, and Amdahl limitations stand in the way to obtaining faster performance.
 The bar plot above shows inference performance with and without offloading the ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s ARM Cortex A9 SoC.
@@ -288,41 +298,37 @@ This kind of high-level visibility is essential to system designers who want to
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/08/10/DLPack-Bridge.html b/2018/08/10/DLPack-Bridge.html
index 0ec196d..7627daf 100644
--- a/2018/08/10/DLPack-Bridge.html
+++ b/2018/08/10/DLPack-Bridge.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Building a Cross-Framework Deep Learning Compiler via DLPack</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Building a Cross-Framework Deep Learning Compiler via DLPack </h1>
       <p class="post-meta">
-        <time datetime="2018-08-10T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-08-10T00:00:00-07:00" itemprop="datePublished">
           Aug 10, 2018
         </time>
         
@@ -175,7 +185,7 @@ can operate on DLPack tensors, providing wrappers bridging tensor data
 structures from frameworks such as PyTorch and MxNet <em>with zero-data-copy</em>.</p>
 
 <p>DLPack presents a simple, portable in-memory data structure:</p>
-<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*!
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="cm">/*!
  * \brief Plain C Tensor object, does not manage memory.
  */</span>
 <span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
@@ -201,7 +211,8 @@ structures from frameworks such as PyTorch and MxNet <em>with zero-data-copy</em
   <span class="cm">/*! \brief The offset in bytes to the beginning pointer to data */</span>
   <span class="kt">uint64_t</span> <span class="n">byte_offset</span><span class="p">;</span>
 <span class="p">}</span> <span class="n">DLTensor</span><span class="p">;</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>As an example, we declare and compile a matrix multiplication operator in TVM,
 and build a wrapper that uses the DLPack representation to allow this operator
@@ -215,64 +226,68 @@ between frameworks and TVM:</p>
 Figure 1</p>
 
 <p>First, we compute a reference output in PyTorch:</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">torch</span>
-    <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
-    <span class="n">y</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
-    <span class="n">z</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">mm</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">torch</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">y</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">z</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">mm</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>We then define and build a TVM matrix multiplication operator, using the default
 schedule:</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">convert</span><span class="p">(</span><span class="mi">56</span><span class="p">)</span>
-    <span class="n">X</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'X'</span><span class="p">)</span>
-    <span class="n">Y</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'Y'</span><span class="p">)</span>
-
-    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
-    <span class="n">Z</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">j</span> <span class="p">:</span> <span class="n">tvm</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span  [...]
-    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Z</span><span class="p">.</span><span class="n">op</span><span class="p">)</span>
-    <span class="n">fmm</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> < [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code>    <span class="n">n</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">convert</span><span class="p">(</span><span class="mi">56</span><span class="p">)</span>
+    <span class="n">X</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'X'</span><span class="p">)</span>
+    <span class="n">Y</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">placeholder</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'Y'</span><span class="p">)</span>
+
+    <span class="n">k</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">n</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'k'</span><span class="p">)</span>
+    <span class="n">Z</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">((</span><span class="n">n</span><span class="p">,</span><span class="n">n</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">,</span><span class="n">j</span> <span class="p">:</span> <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span  [...]
+    <span class="n">s</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">create_schedule</span><span class="p">(</span><span class="n">Z</span><span class="o">.</span><span class="n">op</span><span class="p">)</span>
+    <span class="n">fmm</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> < [...]
+</code></pre>
+</div>
 <p>For brevity, we do not cover TVM’s large collection of scheduling primitives
 that we can use to optimize matrix multiplication. If you wish to make a custom
 GEMM operator run <em>fast</em> on your hardware device, a detailed tutorial can be
 found <a href="https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.html">here</a>.</p>
 
 <p>We then convert the TVM function into one that supports PyTorch tensors:</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">from</span> <span class="nn">tvm.contrib.dlpack</span> <span class="kn">import</span> <span class="n">to_pytorch_func</span>
-    <span class="c1"># fmm is the previously built TVM function (Python function)
-</span>    <span class="c1"># fmm is the wrapped TVM function (Python function)
-</span>    <span class="n">fmm_pytorch</span> <span class="o">=</span> <span class="n">to_pytorch_func</span><span class="p">(</span><span class="n">fmm</span><span class="p">)</span>
-    <span class="n">z2</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code>    <span class="kn">from</span> <span class="nn">tvm.contrib.dlpack</span> <span class="kn">import</span> <span class="n">to_pytorch_func</span>
+    <span class="c"># fmm is the previously built TVM function (Python function)</span>
+    <span class="c"># fmm is the wrapped TVM function (Python function)</span>
+    <span class="n">fmm_pytorch</span> <span class="o">=</span> <span class="n">to_pytorch_func</span><span class="p">(</span><span class="n">fmm</span><span class="p">)</span>
+    <span class="n">z2</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">)</span>
     <span class="n">fmm_pytorch</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z2</span><span class="p">)</span>
-    <span class="n">np</span><span class="p">.</span><span class="n">testing</span><span class="p">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="p">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">z2</span><span class="p">.</span><span class="n">numpy</span><span class="p">())</span>
-</code></pre></div></div>
+    <span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">z2</span><span class="o">.</span><span class="n">numpy</span><span class="p">())</span>
+</code></pre>
+</div>
 <p>and verify that the results match.</p>
 
 <p>We can repeat the same example, but using MxNet instead:</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">mxnet</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code>    <span class="kn">import</span> <span class="nn">mxnet</span>
     <span class="kn">from</span> <span class="nn">tvm.contrib.mxnet</span> <span class="kn">import</span> <span class="n">to_mxnet_func</span>
-    <span class="n">ctx</span> <span class="o">=</span> <span class="n">mxnet</span><span class="p">.</span><span class="n">cpu</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
-    <span class="n">x</span> <span class="o">=</span> <span class="n">mxnet</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
-    <span class="n">y</span> <span class="o">=</span> <span class="n">mxnet</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
-    <span class="n">z</span> <span class="o">=</span> <span class="n">mxnet</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
-    <span class="n">f</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> <sp [...]
+    <span class="n">ctx</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">cpu</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
+    <span class="n">x</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">y</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">z</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">empty</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">56</span><span class="p">,</span><span class="mi">56</span><span class="p">),</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+    <span class="n">f</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">[</span><span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">Z</span><span class="p">],</span> <span class="n">target_host</span><span class="o">=</span><span class="s">'llvm'</span><span class="p">,</span> <sp [...]
     <span class="n">f_mxnet</span> <span class="o">=</span> <span class="n">to_mxnet_func</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
     <span class="n">f_mxnet</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">z</span><span class="p">)</span>
-    <span class="n">np</span><span class="p">.</span><span class="n">testing</span><span class="p">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="p">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">x</span><span class="p">.</span><span class="n">asnumpy</span><span class="p">().</span><span class="n">dot</span><span class="p">(</span><span class="n">y</span><span class="p">.</span><span class="n"> [...]
-</code></pre></div></div>
+    <span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_allclose</span><span class="p">(</span><span class="n">z</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">(),</span> <span class="n">x</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">y</span><span class="o">. [...]
+</code></pre>
+</div>
 
 <h2 id="under-the-hood-of-the-pytorch-example">Under the hood of the PyTorch Example</h2>
-<p>As TVM provides <a href="https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455">functions</a> to convert dlpack tensors to tvm <code class="language-plaintext highlighter-rouge">NDArray</code>s and
+<p>As TVM provides <a href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455">functions</a> to convert dlpack tensors to tvm <code class="highlighter-rouge">NDArray</code>s and
 vice-versa, so all that is needed is some syntactic sugar by wrapping functions.
-<code class="language-plaintext highlighter-rouge">convert_func</code> is a generic converter for frameworks using tensors with dlpack
+<code class="highlighter-rouge">convert_func</code> is a generic converter for frameworks using tensors with dlpack
 support, and can be used to implement convenient converters, such as
-<code class="language-plaintext highlighter-rouge">to_pytorch_func</code>.</p>
+<code class="highlighter-rouge">to_pytorch_func</code>.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">tensor_type</span><span class="p">,</span> <span class="n">to_dlpack_func</span><span class="p">):</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">tensor_type</span><span class="p">,</span> <span class="n">to_dlpack_func</span><span class="p">):</span>
     <span class="k">assert</span> <span class="nb">callable</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">)</span>
 
     <span class="k">def</span> <span class="nf">_wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">):</span>
-        <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">ndarray</span><span class="p">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">to_dlpack_func</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>\
+        <span class="n">args</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">(</span><span class="n">ndarray</span><span class="o">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">to_dlpack_func</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>\
             <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">tensor_type</span><span class="p">)</span> <span class="k">else</span> <span class="n">arg</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">)</span>
         <span class="k">return</span> <span class="n">tvm_func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
 
@@ -281,49 +296,46 @@ support, and can be used to implement convenient converters, such as
 <span class="k">def</span> <span class="nf">to_pytorch_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">):</span>
     <span class="kn">import</span> <span class="nn">torch</span>
     <span class="kn">import</span> <span class="nn">torch.utils.dlpack</span>
-    <span class="k">return</span> <span class="n">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">torch</span><span class="p">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">dlpack</span><span class="p">.</span><span class="n">to_dlpack</span><span class="p">)</span>
-</code></pre></div></div>
+    <span class="k">return</span> <span class="n">convert_func</span><span class="p">(</span><span class="n">tvm_func</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">dlpack</span><span class="o">.</span><span class="n">to_dlpack</span><span class="p">)</span>
+</code></pre>
+</div>
 
     </div>
   </div>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/10/03/auto-opt-all.html b/2018/10/03/auto-opt-all.html
index f5f1482..b50d09a 100644
--- a/2018/10/03/auto-opt-all.html
+++ b/2018/10/03/auto-opt-all.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Automatic Kernel Optimization for Deep Learning on All Hardware Platforms</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Automatic Kernel Optimization for Deep Learning on All Hardware Platforms </h1>
       <p class="post-meta">
-        <time datetime="2018-10-03T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-10-03T00:00:00-07:00" itemprop="datePublished">
           Oct 3, 2018
         </time>
         
@@ -203,20 +213,21 @@ The detailed instructions are omitted due to the space limit of a blog post.
 Links to tutorials for ARM CPU, Mali GPU, NVIDIA GPU, AMD GPU are all available at the end of this blog.</p>
 
 <p>First we get a pre-trained model from MXNet model zoo, and extract tuning tasks from it.</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">mxnet.gluon.model_zoo.vision</span> <span class="kn">import</span> <span class="n">get_model</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">mxnet.gluon.model_zoo.vision</span> <span class="kn">import</span> <span class="n">get_model</span>
 
 <span class="n">block</span> <span class="o">=</span> <span class="n">get_model</span><span class="p">(</span><span class="s">'resnet18_v1'</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
-<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">nnvm</span><span class="p">.</span><span class="n">frontend</span><span class="p">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
+<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">nnvm</span><span class="o">.</span><span class="n">frontend</span><span class="o">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
 
-<span class="n">tasks</span> <span class="o">=</span> <span class="n">autotvm</span><span class="p">.</span><span class="n">extract_from_graph</span><span class="p">(</span><span class="n">net</span><span class="p">)</span>
+<span class="n">tasks</span> <span class="o">=</span> <span class="n">autotvm</span><span class="o">.</span><span class="n">extract_from_graph</span><span class="p">(</span><span class="n">net</span><span class="p">)</span>
 <span class="n">tune_tasks</span><span class="p">(</span><span class="n">tasks</span><span class="p">,</span> <span class="o">**</span><span class="n">tuning_option</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 <p>There are 12 different conv2d layers in resnet-18, so we launch 12 tuning tasks.
 For each of them, the tuner makes several hundreds of trials and picks the best one.
 After finishing all tuning tasks, we compile the whole network and generate a single deployable minimal library.
 One sample output is</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Extract tasks...
+<div class="highlighter-rouge"><pre class="highlight"><code>Extract tasks...
 Tuning...
 [Task  1/12]  Current/Best:   22.37/  52.19 GFLOPS | Progress: (544/1000) | 406.59 s Done.
 [Task  2/12]  Current/Best:    6.51/  18.77 GFLOPS | Progress: (608/1000) | 325.05 s Done.
@@ -234,7 +245,8 @@ Compile...
 Upload...
 Evaluate inference time cost...
 Mean inference time (std dev): 162.59 ms (0.06 ms)
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>The tuning is especially helpful and worth a try if your model has some strange shapes or
 your hardware is customized, as hand-optimized static libraries cannot consider all situations.</p>
@@ -544,41 +556,37 @@ for inference deployment. TVM just provides such a solution.</p>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/10/09/ml-in-tees.html b/2018/10/09/ml-in-tees.html
index 3838be6..b7c74f2 100644
--- a/2018/10/09/ml-in-tees.html
+++ b/2018/10/09/ml-in-tees.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Efficient Privacy-Preserving ML Using TVM</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Efficient Privacy-Preserving ML Using TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-10-09T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2018-10-09T00:00:00-07:00" itemprop="datePublished">
           Oct 9, 2018
         </time>
         
@@ -266,41 +276,37 @@ His research interest is in the general domain of ML on shared private data, but
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2018/12/18/lowprecision-conv.html b/2018/12/18/lowprecision-conv.html
index 31738a3..409fe14 100644
--- a/2018/12/18/lowprecision-conv.html
+++ b/2018/12/18/lowprecision-conv.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Automating Generation of Low Precision Deep Learning Operators</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Automating Generation of Low Precision Deep Learning Operators </h1>
       <p class="post-meta">
-        <time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">
+        <time datetime="2018-12-18T00:00:00-08:00" itemprop="datePublished">
           Dec 18, 2018
         </time>
         
@@ -220,37 +230,38 @@ popcount to accumulate values in the packed data. The bitplane axes become addit
 and compute the binary dot products between different bitplanes of the input and kernel.
 Finally, the output is computed in an unpacked format and in higher precision.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Input_bitpacked</span> <span class="o">=</span> <span class="n">bitpack</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">activation_bits</span><span class="p">,</span> <span class="n">pack_axis</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bit_axis</span><span class="o">=</spa [...]
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">Input_bitpacked</span> <span class="o">=</span> <span class="n">bitpack</span><span class="p">(</span><span class="n">Input</span><span class="p">,</span> <span class="n">activation_bits</span><span class="p">,</span> <span class="n">pack_axis</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">bit_axis</span><span class="o">=</span><span class="mi">4</s [...]
 <span class="n">Weights_bitpacked</span> <span class="o">=</span> <span class="n">bitpack</span><span class="p">(</span><span class="n">Filter</span><span class="p">,</span> <span class="n">weight_bits</span><span class="p">,</span> <span class="n">pack_axis</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">bit_axis</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">pack_type</span><span class="o"> [...]
-<span class="n">batch</span><span class="p">,</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">in_width</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Input_bitpacked</span><span class="p">.</span><span class="n">shape</span>
-<span class="n">kernel_h</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">num_filter</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Filter_bitpakced</span><span class="p">.</span><span class="n">shape</span>
+<span class="n">batch</span><span class="p">,</span> <span class="n">in_height</span><span class="p">,</span> <span class="n">in_width</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Input_bitpacked</span><span class="o">.</span><span class="n">shape</span>
+<span class="n">kernel_h</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">num_filter</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">Filter_bitpakced</span><span class="o">.</span><span class="n">shape</span>
 
 <span class="n">stride_h</span><span class="p">,</span> <span class="n">stride_w</span> <span class="o">=</span> <span class="n">stride</span>
 <span class="n">pad_top</span><span class="p">,</span> <span class="n">pad_left</span><span class="p">,</span> <span class="n">pad_down</span><span class="p">,</span> <span class="n">pad_right</span> <span class="o">=</span> <span class="n">get_pad_tuple</span><span class="p">(</span><span class="n">padding</span><span class="p">,</span> <span class="p">(</span><span class="n">kernel_h</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">))</span>
 
-<span class="c1"># Computing the output shape
-</span><span class="n">out_channel</span> <span class="o">=</span> <span class="n">num_filter</span>
+<span class="c"># Computing the output shape</span>
+<span class="n">out_channel</span> <span class="o">=</span> <span class="n">num_filter</span>
 <span class="n">out_height</span> <span class="o">=</span> <span class="n">simplify</span><span class="p">((</span><span class="n">in_height</span> <span class="o">-</span> <span class="n">kernel_h</span> <span class="o">+</span> <span class="n">pad_top</span> <span class="o">+</span> <span class="n">pad_down</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride_h</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
 <span class="n">out_width</span> <span class="o">=</span> <span class="n">simplify</span><span class="p">((</span><span class="n">in_width</span> <span class="o">-</span> <span class="n">kernel_w</span> <span class="o">+</span> <span class="n">pad_left</span> <span class="o">+</span> <span class="n">pad_right</span><span class="p">)</span> <span class="o">//</span> <span class="n">stride_w</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
 <span class="n">pad_before</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_top</span><span class="p">,</span> <span class="n">pad_left</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
 <span class="n">pad_after</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">pad_down</span><span class="p">,</span> <span class="n">pad_right</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
 <span class="n">Input_padded</span> <span class="o">=</span> <span class="n">pad</span><span class="p">(</span><span class="n">Input_bitpacked</span><span class="p">,</span> <span class="n">pad_before</span><span class="p">,</span> <span class="n">pad_after</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s">"PaddedInput"</span><span class="p">)</span>
 
-<span class="c1"># Treat the bitplane axes like additional reduction axes
-</span><span class="n">rc</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rc'</span><span class="p">)</span>
-<span class="n">ry</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_h</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ry'</span><span class="p">)</span>
-<span class="n">rx</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rx'</span><span class="p">)</span>
-<span class="n">ib</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">input_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ib'</span><span class="p">)</span>
-<span class="n">wb</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">weight_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'wb'</span><span class="p">)</span>
+<span class="c"># Treat the bitplane axes like additional reduction axes</span>
+<span class="n">rc</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">in_channel_q</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rc'</span><span class="p">)</span>
+<span class="n">ry</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_h</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ry'</span><span class="p">)</span>
+<span class="n">rx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">kernel_w</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'rx'</span><span class="p">)</span>
+<span class="n">ib</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">input_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'ib'</span><span class="p">)</span>
+<span class="n">wb</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">reduce_axis</span><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="n">weight_bits</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="s">'wb'</span><span class="p">)</span>
 
 
-<span class="n">tvm</span><span class="p">.</span><span class="n">compute</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">out_height</span><span class="p">,</span> <span class="n">out_width</span><span class="p">,</span> <span class="n">out_channel</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">nn</span><span class="p">,</span> <span class="n">yy</span><span class="p">,</span> <span class="n">xx</span><spa [...]
-             <span class="n">tvm</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">tvm</span><span class="p">.</span><span class="n">popcount</span><span class="p">(</span>
+<span class="n">tvm</span><span class="o">.</span><span class="n">compute</span><span class="p">((</span><span class="n">batch</span><span class="p">,</span> <span class="n">out_height</span><span class="p">,</span> <span class="n">out_width</span><span class="p">,</span> <span class="n">out_channel</span><span class="p">),</span> <span class="k">lambda</span> <span class="n">nn</span><span class="p">,</span> <span class="n">yy</span><span class="p">,</span> <span class="n">xx</span><spa [...]
+             <span class="n">tvm</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">tvm</span><span class="o">.</span><span class="n">popcount</span><span class="p">(</span>
                <span class="n">Input_padded</span><span class="p">[</span><span class="n">nn</span><span class="p">,</span> <span class="n">yy</span> <span class="o">*</span> <span class="n">stride_h</span> <span class="o">+</span> <span class="n">ry</span><span class="p">,</span> <span class="n">xx</span> <span class="o">*</span> <span class="n">stride_w</span> <span class="o">+</span> <span class="n">rx</span><span class="p">,</span> <span class="n">rc</span><span class="p">,</span> <s [...]
-               <span class="n">Weights_bitpacked</span><span class="p">[</span><span class="n">ry</span><span class="p">,</span> <span class="n">rx</span><span class="p">,</span> <span class="n">rc</span><span class="p">,</span> <span class="n">ff</span><span class="p">,</span> <span class="n">wb</span><span class="p">]))</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">ib</span><span class="o">+</span><span class="n">wb</span><span class="p">))).</span><spa [...]
+               <span class="n">Weights_bitpacked</span><span class="p">[</span><span class="n">ry</span><span class="p">,</span> <span class="n">rx</span><span class="p">,</span> <span class="n">rc</span><span class="p">,</span> <span class="n">ff</span><span class="p">,</span> <span class="n">wb</span><span class="p">]))</span> <span class="o">&lt;&lt;</span> <span class="p">(</span><span class="n">ib</span><span class="o">+</span><span class="n">wb</span><span class="p">)))</span><span [...]
                <span class="n">axis</span><span class="o">=</span><span class="p">[</span><span class="n">rc</span><span class="p">,</span> <span class="n">ry</span><span class="p">,</span> <span class="n">rx</span><span class="p">,</span> <span class="n">wb</span><span class="p">,</span> <span class="n">ib</span><span class="p">]))</span>
 
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>In our schedule we apply common optimizations like vectorization and memory tiling to provide better
 memory locality and take advantage of SIMD units. Some of these optimizations such as tiling,
@@ -292,8 +303,8 @@ Note: x86 doesn’t support a vectorized popcount for this microarchitecture, so
 <h2 id="show-me-the-code">Show me the code</h2>
 
 <ul>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py">TOPI bitserial convolution</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI ARM cpu bitserial convolution</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI bitserial convolution</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI ARM cpu bitserial convolution</a></li>
 </ul>
 
 <h2 id="references">References</h2>
@@ -311,41 +322,37 @@ Note: x86 doesn’t support a vectorized popcount for this microarchitecture, so
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2019/01/19/Golang.html b/2019/01/19/Golang.html
index 87e1e11..ecc2aee 100644
--- a/2019/01/19/Golang.html
+++ b/2019/01/19/Golang.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>TVM Golang Runtime for Deep Learning Deployment</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>TVM Golang Runtime for Deep Learning Deployment </h1>
       <p class="post-meta">
-        <time datetime="2019-01-19T00:00:00-05:00" itemprop="datePublished">
+        <time datetime="2019-01-19T00:00:00-08:00" itemprop="datePublished">
           Jan 19, 2019
         </time>
         
@@ -169,12 +179,12 @@ integrates the TVM runtime can load these compiled modules and perform inference
 import and compilation using TVM can be found at <a href="https://tvm.apache.org/docs//tutorials/">tutorials</a>.</p>
 
 <p>TVM now supports deploying compiled modules through Golang. Golang applications can make use of this
-to deploy the deep learning models through TVM. The scope of this blog is the introduction of <code class="language-plaintext highlighter-rouge">gotvm</code> package,
-the package build process and a sample application using <code class="language-plaintext highlighter-rouge">gotvm</code> to load a compiled module and perform inference.</p>
+to deploy the deep learning models through TVM. The scope of this blog is the introduction of <code class="highlighter-rouge">gotvm</code> package,
+the package build process and a sample application using <code class="highlighter-rouge">gotvm</code> to load a compiled module and perform inference.</p>
 
 <h2 id="package">Package</h2>
 
-<p>The golang package <code class="language-plaintext highlighter-rouge">gotvm</code> is built on top of TVM’s C runtime interface. The API in this package
+<p>The golang package <code class="highlighter-rouge">gotvm</code> is built on top of TVM’s C runtime interface. The API in this package
 abstracts the native C types and provides Golang compatible types. The package source can be found
 at <a href="https://github.com/dmlc/tvm/tree/master/golang">gotvm</a>.</p>
 
@@ -187,10 +197,10 @@ necessary conversions across API calls.</p>
 
 <h2 id="how-to">How to</h2>
 
-<p>As shown in the below diagram <code class="language-plaintext highlighter-rouge">gotvm</code> enables golang applications to integrate deep learning models
+<p>As shown in the below diagram <code class="highlighter-rouge">gotvm</code> enables golang applications to integrate deep learning models
 from various frameworks without the hassle of understanding each framework related interface API.
 Developers can make use of TVM to import and compile deep learning models and generate TVM artifacts.
-<code class="language-plaintext highlighter-rouge">gotvm</code> package provides golang friendly API to load, configure, feed input and get output.</p>
+<code class="highlighter-rouge">gotvm</code> package provides golang friendly API to load, configure, feed input and get output.</p>
 
 <p style="text-align: center"><img src="/images/golang/TVM-Golang-Flow.png" alt="image" width="100%" /></p>
 <center> Import, Compile, Integrate and Deploy</center>
@@ -202,8 +212,8 @@ generates the artifacts required to integrate and deploy the model on a target.<
 
 <h2 id="api">API</h2>
 
-<p><code class="language-plaintext highlighter-rouge">gotvm</code> package provides a handful of datatypes and API functions to initialize, load and infer
-from a golang application. Like any other golang package we just need to import <code class="language-plaintext highlighter-rouge">gotvm</code> package here.</p>
+<p><code class="highlighter-rouge">gotvm</code> package provides a handful of datatypes and API functions to initialize, load and infer
+from a golang application. Like any other golang package we just need to import <code class="highlighter-rouge">gotvm</code> package here.</p>
 
 <ul>
   <li>Module : The Module API can be used to load a TVM compiled module into TVM runtime and access any functions.</li>
@@ -218,44 +228,44 @@ from a golang application. Like any other golang package we just need to import
 <p>A simple example with inline documentation of loading a compiled module and performing inference is shown below.
 For simplicity the error handling is ignored here, but is important in real applications.</p>
 
-<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
+<div class="language-cpp highlighter-rouge"><pre class="highlight"><code>
 <span class="n">package</span> <span class="n">main</span>
 
-<span class="c1">// Import compiled gotvm package.</span>
-<span class="n">import</span> <span class="p">(</span>
+<span class="c1">// Import compiled gotvm package.
+</span><span class="n">import</span> <span class="p">(</span>
     <span class="s">"./gotvm"</span>
 <span class="p">)</span>
 
-<span class="c1">// Some constants for TVM compiled model paths.</span>
-<span class="c1">// modLib : Is the compiled library exported out of compilation.</span>
-<span class="c1">// modJson : TVM graph JSON.</span>
-<span class="c1">// modParams : Exported params out of TVM compilation process.</span>
-<span class="k">const</span> <span class="p">(</span>
+<span class="c1">// Some constants for TVM compiled model paths.
+// modLib : Is the compiled library exported out of compilation.
+// modJson : TVM graph JSON.
+// modParams : Exported params out of TVM compilation process.
+</span><span class="k">const</span> <span class="p">(</span>
     <span class="n">modLib</span>    <span class="o">=</span> <span class="s">"./libdeploy.so"</span>
     <span class="n">modJSON</span>   <span class="o">=</span> <span class="s">"./deploy.json"</span>
     <span class="n">modParams</span> <span class="o">=</span> <span class="s">"./deploy.params"</span>
 <span class="p">)</span>
 
-<span class="c1">// main</span>
-<span class="n">func</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
-    <span class="c1">// Some util API to query underlying TVM and DLPack version information.</span>
-    <span class="n">fmt</span><span class="p">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"TVM Version   : v%v</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">TVMVersion</span><span class="p">)</span>
+<span class="c1">// main
+</span><span class="n">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
+    <span class="c1">// Some util API to query underlying TVM and DLPack version information.
+</span>    <span class="n">fmt</span><span class="p">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"TVM Version   : v%v</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">TVMVersion</span><span class="p">)</span>
     <span class="n">fmt</span><span class="p">.</span><span class="n">Printf</span><span class="p">(</span><span class="s">"DLPACK Version: v%v</span><span class="se">\n\n</span><span class="s">"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">DLPackVersion</span><span class="p">)</span>
 
-    <span class="c1">// Import tvm module (so).</span>
-    <span class="n">modp</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">LoadModuleFromFile</span><span class="p">(</span><span class="n">modLib</span><span class="p">)</span>
+    <span class="c1">// Import tvm module (so).
+</span>    <span class="n">modp</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">LoadModuleFromFile</span><span class="p">(</span><span class="n">modLib</span><span class="p">)</span>
 
-    <span class="c1">// Load module on tvm runtime - call tvm.graph_runtime.create</span>
-    <span class="c1">// with module and graph JSON.</span>
-    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modJSON</span><span class="p">)</span>
+    <span class="c1">// Load module on tvm runtime - call tvm.graph_runtime.create
+</span>    <span class="c1">// with module and graph JSON.
+</span>    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modJSON</span><span class="p">)</span>
     <span class="n">jsonStr</span> <span class="o">:=</span> <span class="n">string</span><span class="p">(</span><span class="n">bytes</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">GetGlobalFunction</span><span class="p">(</span><span class="s">"tvm.graph_runtime.create"</span><span class="p">)</span>
     <span class="n">graphrt</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">jsonStr</span><span class="p">,</span> <span class="n">modp</span><span class="p">,</span> <span class="p">(</span><span class="n">int64</span><span class="p">)(</span><span class="n">gotvm</span><span class="p">.</span><span class="n">KDLCPU</span><span class=" [...]
     <span class="n">graphmod</span> <span class="o">:=</span> <span class="n">graphrt</span><span class="p">.</span><span class="n">AsModule</span><span class="p">()</span>
 
 
-    <span class="c1">// Allocate input &amp; output arrays and fill some data for input.</span>
-    <span class="n">tshapeIn</span>  <span class="o">:=</span> <span class="p">[]</span><span class="n">int64</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">3</span><span class="p">}</span>
+    <span class="c1">// Allocate input &amp; output arrays and fill some data for input.
+</span>    <span class="n">tshapeIn</span>  <span class="o">:=</span> <span class="p">[]</span><span class="n">int64</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">224</span><span class="p">,</span> <span class="mi">3</span><span class="p">}</span>
     <span class="n">tshapeOut</span> <span class="o">:=</span> <span class="p">[]</span><span class="n">int64</span><span class="p">{</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1001</span><span class="p">}</span>
     <span class="n">inX</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">Empty</span><span class="p">(</span><span class="n">tshapeIn</span><span class="p">,</span> <span class="s">"float32"</span><span class="p">,</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">CPU</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
     <span class="n">out</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">gotvm</span><span class="p">.</span><span class="n">Empty</span><span class="p">(</span><span class="n">tshapeOut</span><span class="p">)</span>
@@ -266,41 +276,42 @@ For simplicity the error handling is ignored here, but is important in real appl
                                                <span class="n">rand</span><span class="p">.</span><span class="n">Float32</span><span class="p">()</span> <span class="p">})</span>
     <span class="n">inX</span><span class="p">.</span><span class="n">CopyFrom</span><span class="p">(</span><span class="n">inSlice</span><span class="p">)</span>
 
-    <span class="c1">// Load params</span>
-    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modParams</span><span class="p">)</span>
+    <span class="c1">// Load params
+</span>    <span class="n">bytes</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">ioutil</span><span class="p">.</span><span class="n">ReadFile</span><span class="p">(</span><span class="n">modParams</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"load_params"</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">bytes</span><span class="p">)</span>
 
 
-    <span class="c1">// Set module input</span>
-    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"set_input"</span><span class="p">)</span>
+    <span class="c1">// Set module input
+</span>    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"set_input"</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="s">"input"</span><span class="p">,</span> <span class="n">inX</span><span class="p">)</span>
 
-    <span class="c1">// Run or Execute the graph</span>
-    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"run"</span><span class="p">)</span>
+    <span class="c1">// Run or Execute the graph
+</span>    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"run"</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">()</span>
 
-    <span class="c1">// Get output from runtime.</span>
-    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"get_output"</span><span class="p">)</span>
+    <span class="c1">// Get output from runtime.
+</span>    <span class="n">funp</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">graphmod</span><span class="p">.</span><span class="n">GetFunction</span><span class="p">(</span><span class="s">"get_output"</span><span class="p">)</span>
     <span class="n">funp</span><span class="p">.</span><span class="n">Invoke</span><span class="p">(</span><span class="n">int64</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span> <span class="n">out</span><span class="p">)</span>
 
-    <span class="c1">// Access output tensor data.</span>
-    <span class="n">outIntf</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">out</span><span class="p">.</span><span class="n">AsSlice</span><span class="p">()</span>
+    <span class="c1">// Access output tensor data.
+</span>    <span class="n">outIntf</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">out</span><span class="p">.</span><span class="n">AsSlice</span><span class="p">()</span>
     <span class="n">outSlice</span> <span class="o">:=</span> <span class="n">outIntf</span><span class="p">.([]</span><span class="n">float32</span><span class="p">)</span>
 
-    <span class="c1">// outSlice here holds flattened output data as a golang slice.</span>
-<span class="p">}</span>
-</code></pre></div></div>
+    <span class="c1">// outSlice here holds flattened output data as a golang slice.
+</span><span class="p">}</span>
+</code></pre>
+</div>
 
-<p><code class="language-plaintext highlighter-rouge">gotvm</code> extends the TVM packed function system to support golang function closures as packed functions.
-<a href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a> available to register golang
+<p><code class="highlighter-rouge">gotvm</code> extends the TVM packed function system to support golang function closures as packed functions.
+<a href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a> available to register golang
 closure as TVM packed function and invoke the same across programming language barriers.</p>
 
 <h2 id="show-me-the-code">Show me the code</h2>
 
 <ul>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/golang/src">Package Source</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/golang/src">Package Source</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
 </ul>
 
 <h2 id="references">References</h2>
@@ -320,41 +331,37 @@ closure as TVM packed function and invoke the same across programming language b
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2019/03/18/tvm-apache-announcement.html b/2019/03/18/tvm-apache-announcement.html
index 0e06763..b154327 100644
--- a/2019/03/18/tvm-apache-announcement.html
+++ b/2019/03/18/tvm-apache-announcement.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>TVM Deep Learning Compiler Joins Apache Software Foundation</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
-
-    
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
-    
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
-    
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
-    
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
-    
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
-    
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
-    
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
-    
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
-    
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
 
-</div>
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
 
+
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>TVM Deep Learning Compiler Joins Apache Software Foundation </h1>
       <p class="post-meta">
-        <time datetime="2019-03-18T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2019-03-18T00:00:00-07:00" itemprop="datePublished">
           Mar 18, 2019
         </time>
         
@@ -173,41 +183,37 @@
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
 
-</section>
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2019/04/29/opt-cuda-quantized.html b/2019/04/29/opt-cuda-quantized.html
index aebb4c7..ce8baf7 100644
--- a/2019/04/29/opt-cuda-quantized.html
+++ b/2019/04/29/opt-cuda-quantized.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Automating Optimization of Quantized Deep Learning Models on CUDA</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Automating Optimization of Quantized Deep Learning Models on CUDA </h1>
       <p class="post-meta">
-        <time datetime="2019-04-29T12:00:00-04:00" itemprop="datePublished">
+        <time datetime="2019-04-29T09:00:00-07:00" itemprop="datePublished">
           Apr 29, 2019
         </time>
         
@@ -155,7 +165,7 @@
     <p>Deep learning has been successfully applied to a variety of tasks.
 On real-time scenarios such as inference on autonomous vehicles, the inference speed of the model is critical.
 Network quantization is an effective approach to accelerating deep learning models.
-In quantized models, both data and model parameters are represented with low precision data types such as <code class="language-plaintext highlighter-rouge">int8</code> and <code class="language-plaintext highlighter-rouge">float16</code>.
+In quantized models, both data and model parameters are represented with low precision data types such as <code class="highlighter-rouge">int8</code> and <code class="highlighter-rouge">float16</code>.
 The lowered data bandwidth reduces the inference time and memory/storage requirements, as well as the power consumption.
 Meanwhile, under proper quantization schemes, we can minimize the accuracy drops of the quantized models.
 Therefore, quantized models are of particular interests of researchers and developers as it makes large models suitable to deploy on diverse devices, such as GPU, CPU and mobile devices.</p>
@@ -177,38 +187,38 @@ In emerging models such as ResNeXt and Deformable ConvNets, the automatic optimi
 
 <h1 id="expressing-quantized-cuda-kernels-in-tvm">Expressing Quantized CUDA Kernels in TVM</h1>
 <h2 id="leveraging-tensor-intrinsics-via-tensorization">Leveraging Tensor Intrinsics via Tensorization</h2>
-<p>Many platforms provide architecture-specific instructions for special computation patterns, for example, the SIMD instructions on x86, and the <code class="language-plaintext highlighter-rouge">dp4a</code> and <code class="language-plaintext highlighter-rouge">hfma</code> instructions on CUDA.
+<p>Many platforms provide architecture-specific instructions for special computation patterns, for example, the SIMD instructions on x86, and the <code class="highlighter-rouge">dp4a</code> and <code class="highlighter-rouge">hfma</code> instructions on CUDA.
 These intrinsic instructions are highly optimized for specific devices.
 By leveraging hardware intrinsics, we can achieve a significant performance boost for quantized operators.</p>
 
 <p>Currently, <a href="https://devblogs.nvidia.com/mixed-precision-programming-cuda-8/">dp4a</a> has been extensively used in TVM int8 operators on CUDA.
-<code class="language-plaintext highlighter-rouge">dp4a</code> is a CUDA intrinsic on Compute Capability 6.1 devices.
+<code class="highlighter-rouge">dp4a</code> is a CUDA intrinsic on Compute Capability 6.1 devices.
 It is a mixed-precision instruction that provides the efficient computation of the dot product between two 4-element 8-bit integer vectors and accumulates the result in 32-bit format.
-Using <code class="language-plaintext highlighter-rouge">dp4a</code>, we can implement a dot product between 8-bit integer vectors with number of elements evenly divisible by four.
+Using <code class="highlighter-rouge">dp4a</code>, we can implement a dot product between 8-bit integer vectors with number of elements evenly divisible by four.
 With an efficient dot product operator, we can implement high-level operators such as 2d convolution and dense layers as these operators are commonly backed by dot products.</p>
 
 <p>To illustrate, in 2d convolution we accumulate along the channel, the width, and the height axis of the kernel.
-This is a typical use case of <code class="language-plaintext highlighter-rouge">dp4a</code>.
+This is a typical use case of <code class="highlighter-rouge">dp4a</code>.
 TVM uses tensorization to support calling external intrinsics.
-We do not need to modify the original computation declaration; we use the schedule primitive <code class="language-plaintext highlighter-rouge">tensorize</code> to replace the accumulation with <code class="language-plaintext highlighter-rouge">dp4a</code> tensor intrinsic.
+We do not need to modify the original computation declaration; we use the schedule primitive <code class="highlighter-rouge">tensorize</code> to replace the accumulation with <code class="highlighter-rouge">dp4a</code> tensor intrinsic.
 More details of tensorization can be found in the <a href="https://tvm.apache.org/docs//tutorials/language/tensorize.html">tutorial</a>.</p>
 
 <h2 id="data-layout-rearrangement">Data Layout Rearrangement</h2>
 <p>One of the challenges in tensorization is that we may need to design special computation logic to adapt to the requirement of tensor intrinsics.
-Although it is natural to accumulate along the inner axis of the tensor in the dense operator, <code class="language-plaintext highlighter-rouge">conv2d</code> can be more challenging.
-In <code class="language-plaintext highlighter-rouge">conv2d</code> we expect to take a slice in the channel dimension as the input of <code class="language-plaintext highlighter-rouge">dp4a</code> because the number of channels is typically multiple of 4 (otherwise we fall back to original <code class="language-plaintext highlighter-rouge">conv2d</code> in NCHW layout).
+Although it is natural to accumulate along the inner axis of the tensor in the dense operator, <code class="highlighter-rouge">conv2d</code> can be more challenging.
+In <code class="highlighter-rouge">conv2d</code> we expect to take a slice in the channel dimension as the input of <code class="highlighter-rouge">dp4a</code> because the number of channels is typically multiple of 4 (otherwise we fall back to original <code class="highlighter-rouge">conv2d</code> in NCHW layout).
 Meanwhile, to achieve memory locality, we would like to reduce along the innermost axis first.
 Taking these factors into account, we use a custom data layout to address this challenge.</p>
 
-<p>In CUDA int8 2d convolution, we empirically choose <code class="language-plaintext highlighter-rouge">NCHW4c</code> as data layout and <code class="language-plaintext highlighter-rouge">OIHW4o4i</code> as weight layout.
-The templates can also be easily generalized to <code class="language-plaintext highlighter-rouge">NCHW[x]c</code> and <code class="language-plaintext highlighter-rouge">OIHW[x]o[x]i</code>, where x is an arbitrary positive integer divisible by four.
+<p>In CUDA int8 2d convolution, we empirically choose <code class="highlighter-rouge">NCHW4c</code> as data layout and <code class="highlighter-rouge">OIHW4o4i</code> as weight layout.
+The templates can also be easily generalized to <code class="highlighter-rouge">NCHW[x]c</code> and <code class="highlighter-rouge">OIHW[x]o[x]i</code>, where x is an arbitrary positive integer divisible by four.
 In the data layout we choose, slices of channels are in the packed innermost dimension.
 Likewise, we pack slices in both the input and output channel dimensions of the weight so that the output has a consistent data layout with the input, which prevents redundant layout transformations between layers.</p>
 
 <p>We show the computation of one element of the output of the 2d convolution in Figure 2.
 The element in each position of the super dimension (the outer dimension of the blocked layout which contains packed elements) NCHW and OIHW is the packed input and kernel, respectively.
 Each column of the packed kernel comes from a different filter.
-We calculate the dot product between the packed input and each row in the packed kernel using <code class="language-plaintext highlighter-rouge">dp4a</code>, and accumulate the result to the output tensor.</p>
+We calculate the dot product between the packed input and each row in the packed kernel using <code class="highlighter-rouge">dp4a</code>, and accumulate the result to the output tensor.</p>
 
 <p style="text-align: center"><img src="/images/cuda-quantized/conv2d.png" alt="image" width="60%" /></p>
 <div>
@@ -219,7 +229,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and weight layout in OIHW4o4
 </div>
 <p></p>
 
-<p>After we have specified the layout of convolution layers, other operators such as <code class="language-plaintext highlighter-rouge">add</code> and activations can automatically adapt to the chosen layout during the <a href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a> pass in Relay.
+<p>After we have specified the layout of convolution layers, other operators such as <code class="highlighter-rouge">add</code> and activations can automatically adapt to the chosen layout during the <a href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a> pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, we can run the whole model in the same layout without extra overhead.</p>
 
 <h2 id="designing-search-space-for-automatic-optimization">Designing Search Space for Automatic Optimization</h2>
@@ -231,8 +241,8 @@ For example, as caching data in the shared memory is a common practice in CUDA p
 We also do some manual tiling such as splitting axes by 4 or 16 to facilitate vectorized memory access.</p>
 
 <p>In quantized 2d convolution, we design a search space that includes a set of tunable options, such as the tile size, the axes to fuse, configurations of loop unrolling and double buffering.
-The templates of quantized <code class="language-plaintext highlighter-rouge">conv2d</code> and <code class="language-plaintext highlighter-rouge">dense</code> on CUDA are registered under template key <code class="language-plaintext highlighter-rouge">int8</code>.
-During automatic tuning, we can create tuning tasks for these quantized operators by setting the <code class="language-plaintext highlighter-rouge">template_key</code> argument.
+The templates of quantized <code class="highlighter-rouge">conv2d</code> and <code class="highlighter-rouge">dense</code> on CUDA are registered under template key <code class="highlighter-rouge">int8</code>.
+During automatic tuning, we can create tuning tasks for these quantized operators by setting the <code class="highlighter-rouge">template_key</code> argument.
 Details of how to launch automatic optimization can be found in the <a href="https://tvm.apache.org/docs//tutorials/autotvm/tune_relay_cuda.html">AutoTVM tutorial</a>.</p>
 
 <h1 id="general-workflow">General Workflow</h1>
@@ -243,22 +253,25 @@ Details of how to launch automatic optimization can be found in the <a href="htt
 
 <p>TVM provides an easy workflow to quantize trained models from other frameworks, automatically optimize operators (with AutoTVM), and deploy to different devices.</p>
 
-<p>First, we use the Relay frontend to import existing models. Here we use an MXNet model with <code class="language-plaintext highlighter-rouge">(1, 3, 224, 224)</code> input shape as an example.</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sym</span><span class="p">,</span> <span class="n">arg_params</span><span class="p">,</span> <span class="n">aux_params</span> <span class="o">=</span> <span class="n">mxnet</span><span class="p">.</span><span class="n">model</span><span class="p">.</span><span class="n">load_checkpoint</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> < [...]
-<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">sym</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">{</span><span class="s">'data'</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</sp [...]
-</code></pre></div></div>
+<p>First, we use the Relay frontend to import existing models. Here we use an MXNet model with <code class="highlighter-rouge">(1, 3, 224, 224)</code> input shape as an example.</p>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">sym</span><span class="p">,</span> <span class="n">arg_params</span><span class="p">,</span> <span class="n">aux_params</span> <span class="o">=</span> <span class="n">mxnet</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">load_checkpoint</span><span class="p">(</span><span class="n">model_path</span><span class="p">,</span> <span class="n">epoch</s [...]
+<span class="n">net</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">from_mxnet</span><span class="p">(</span><span class="n">sym</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">{</span><span class="s">'data'</span><span class="p">:</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</sp [...]
+</code></pre>
+</div>
 
 <p>Next, we use the relay quantization API to convert it to a quantized model.</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">net</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">quantize</span><span class="p">.</span><span class="n">quantize</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">net</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">quantize</span><span class="o">.</span><span class="n">quantize</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Then, we use AutoTVM to extract tuning tasks for the operators in the model and perform automatic optimization. The <a href="https://tvm.apache.org/docs//tutorials/autotvm/tune_relay_cuda.html">AutoTVM tutorial</a> provides an example for this.</p>
 
 <p>Finally, we build the model and run inference in the quantized mode.</p>
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">relay</span><span class="p">.</span><span class="n">build_config</span><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
-    <span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
-</code></pre></div></div>
-<p>The result of <code class="language-plaintext highlighter-rouge">relay.build</code> is a deployable library.
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">with</span> <span class="n">relay</span><span class="o">.</span><span class="n">build_config</span><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
+    <span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">net</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+</code></pre>
+</div>
+<p>The result of <code class="highlighter-rouge">relay.build</code> is a deployable library.
 We can either run inference <a href="https://tvm.apache.org/docs//tutorials/frontend/from_mxnet.html#execute-the-portable-graph-on-tvm">on the GPU</a> directly or deploy <a href="https://tvm.apache.org/docs//tutorials/frontend/deploy_model_on_rasp.html#deploy-the-model-remotely-by-rpc">on the remote devices</a> via RPC.</p>
 
 <h1 id="benchmark">Benchmark</h1>
@@ -280,10 +293,10 @@ We show that automatic optimization in TVM makes it easy and flexible to support
 <h1 id="show-me-the-code">Show Me the Code</h1>
 <ul>
   <li><a href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py">CUDA int8 conv2d</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA int8 group conv2d</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py">CUDA int8 dense</a></li>
-  <li><a href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py">Tensor intrinsics declaration</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA int8 conv2d</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA int8 group conv2d</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA int8 dense</a></li>
+  <li><a href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor intrinsics declaration</a></li>
 </ul>
 
 <h1 id="bio--acknowledgement">Bio &amp; Acknowledgement</h1>
@@ -294,41 +307,37 @@ We show that automatic optimization in TVM makes it easy and flexible to support
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2019/05/30/pytorch-frontend.html b/2019/05/30/pytorch-frontend.html
index c8e2ada..4e89db5 100644
--- a/2019/05/30/pytorch-frontend.html
+++ b/2019/05/30/pytorch-frontend.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Integrating TVM into PyTorch</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Integrating TVM into PyTorch </h1>
       <p class="post-meta">
-        <time datetime="2019-05-30T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2019-05-30T00:00:00-07:00" itemprop="datePublished">
           May 30, 2019
         </time>
         
@@ -159,9 +169,10 @@ To that end, PyTorch now has an official TVM-based backend, <a href="https://git
 
 <p>Usage is simple:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import torch_tvm
+<div class="highlighter-rouge"><pre class="highlight"><code>import torch_tvm
 torch_tvm.enable()
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>That’s it!  PyTorch will then attempt to convert all operators it can to known Relay operators during its JIT compilation process.</p>
 
@@ -177,11 +188,11 @@ torch_tvm.enable()
 
 <p>To support Relay, two features were added to the PyTorch JIT: custom transformation passes and custom subgraph interpreters.</p>
 
-<p>When <code class="language-plaintext highlighter-rouge">torch_tvm</code> is enabled, subgraphs of PyTorch IR that can be converted to Relay <code class="language-plaintext highlighter-rouge">Expr</code>s will be marked as Relay-compatible.  Since PyTorch IR does not always contain shape information, none of the subgraphs can be compiled in a useful way before invocation.</p>
+<p>When <code class="highlighter-rouge">torch_tvm</code> is enabled, subgraphs of PyTorch IR that can be converted to Relay <code class="highlighter-rouge">Expr</code>s will be marked as Relay-compatible.  Since PyTorch IR does not always contain shape information, none of the subgraphs can be compiled in a useful way before invocation.</p>
 
-<p>During user invocation, the PyTorch JIT runtime will determine input shape information and compile the previously marked subgraphs with the new Relay C++ <a href="https://github.com/pytorch/tvm/blob/main/torch_tvm/compiler.cpp#L226-L246">build system</a>.  The compilation is cached based on input shapes for subsequent runs.  More details can be found in the <a href="https://github.com/pytorch/tvm/blob/main/README.md">README</a>.</p>
+<p>During user invocation, the PyTorch JIT runtime will determine input shape information and compile the previously marked subgraphs with the new Relay C++ <a href="https://github.com/pytorch/tvm/blob/master/torch_tvm/compiler.cpp#L226-L246">build system</a>.  The compilation is cached based on input shapes for subsequent runs.  More details can be found in the <a href="https://github.com/pytorch/tvm/blob/master/README.md">README</a>.</p>
 
-<p><code class="language-plaintext highlighter-rouge">torch_tvm</code> has a continuous benchmark system set up, which is monitoring the performance of ResNet18 on CPU.
+<p><code class="highlighter-rouge">torch_tvm</code> has a continuous benchmark system set up, which is monitoring the performance of ResNet18 on CPU.
 Out of the box TVM provides over two times the performance of the default PyTorch JIT backend for various ResNet models.
 Below is a graph that details the iterations per second achieved with 16 threads on an AWS c5n.4xlarge instance (larger is better):</p>
 
@@ -197,9 +208,9 @@ Below is a graph that details the iterations per second achieved with 16 threads
 
 <h3 id="tutorial">Tutorial</h3>
 
-<p>If you have an already written PyTorch model, the easiest way to get started comes from using <code class="language-plaintext highlighter-rouge">torch.jit.trace</code> as follows</p>
+<p>If you have an already written PyTorch model, the easiest way to get started comes from using <code class="highlighter-rouge">torch.jit.trace</code> as follows</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import torch_tvm
+<div class="highlighter-rouge"><pre class="highlight"><code>import torch_tvm
 from your_model import model, inputs
 
 torch_tvm.enable(opt_level=3)
@@ -225,14 +236,15 @@ with torch.no_grad():
     tvm_time = time.time() - start
     
     print("Took {}s to run {} iters".format(tvm_time, iters))
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>Much of this code comes from <a href="https://github.com/pytorch/tvm/blob/main/test/benchmarks.py">benchmarks.py</a>.  Note that tuned parameters for AVX2 LLVM compilation is in the <code class="language-plaintext highlighter-rouge">test/</code> folder of the repo.</p>
+<p>Much of this code comes from <a href="https://github.com/pytorch/tvm/blob/master/test/benchmarks.py">benchmarks.py</a>.  Note that tuned parameters for AVX2 LLVM compilation is in the <code class="highlighter-rouge">test/</code> folder of the repo.</p>
 
 <p>If you are more comfortable using Relay directly, it is possible to simply extract the expression directly from a
 PyTorch function either via (implicit) tracing or TorchScript:</p>
 
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def add(a, b, c):
+<div class="highlighter-rouge"><pre class="highlight"><code>def add(a, b, c):
     return a + b + c
 
 # via tracing
@@ -244,7 +256,8 @@ def mul(a, b, c):
 
 # via script
 relay_graph = torch_tvm.to_relay(mul, inputs)
-</code></pre></div></div>
+</code></pre>
+</div>
 
 
     </div>
@@ -252,41 +265,37 @@ relay_graph = torch_tvm.to_relay(mul, inputs)
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
index d4e4ec8..c51e923 100644
--- a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
+++ b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
 
-</div>
 
 
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Compiling Machine Learning to WASM and WebGPU with Apache TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-05-14T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2020-05-14T00:00:00-07:00" itemprop="datePublished">
           May 14, 2020
         </time>
         
@@ -237,41 +247,37 @@
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
 
-</section>
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2020/05/20/bring-your-own-datatypes.html b/2020/05/20/bring-your-own-datatypes.html
new file mode 100644
index 0000000..88744e2
--- /dev/null
+++ b/2020/05/20/bring-your-own-datatypes.html
@@ -0,0 +1,485 @@
+
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8">
+    <title>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in TVM</title>
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
+</head>
+
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+    
+  
+    
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
+    
+  
+    
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
+
+
+
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
+<div class="container">
+<div class="content">
+  <div class="row">
+    <div class="span14">
+      <h1>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in TVM </h1>
+      <p class="post-meta">
+        <time datetime="2020-05-20T00:00:00-07:00" itemprop="datePublished">
+          May 20, 2020
+        </time>
+        
+        • <span itemprop="author" itemscope itemtype="http://schema.org/Person">
+          <span itemprop="name">Gus Smith</span>
+        </span>
+        
+      </p>
+      <p class="post-meta">
+        </p>
+    </br>
+    <p>In this post, we describe the Bring Your Own Datatypes framework, which enables the use of custom datatypes within TVM.</p>
+
+<h2 id="introduction">Introduction</h2>
+
+<p>When designing accelerators, an important decision is how one will approximately represent real numbers in hardware.
+This problem has had a longstanding, industry-standard solution: the IEEE 754 floating-point standard.<sup id="fnref:ieee"><a href="#fn:ieee" class="footnote">1</a></sup>
+Yet,
+  when trying to squeeze
+  the most out of hardware
+  by building highly specialized designs,
+  does it make sense to use
+  general-purpose IEEE 754 floats?
+If we know the numerical requirements
+  of our workload,
+  could we build a smaller,
+  faster,
+  or more power efficient datatype?
+The answer is yes!
+Researchers have already begun experimenting with new datatypes in academic and industrial accelerator designs.
+For example, Google’s Tensor Processing Unit (the TPU) uses the <code class="highlighter-rouge">bfloat</code> type: a single-precision IEEE float which has been truncated to 16 bits.
+Due to the lax numerical requirements
+  of many deep learning workloads,
+  this truncation often has no effect
+  on model accuracy,
+  while instantly cutting the storage cost
+  in half.<sup id="fnref:jouppi2017datacenter"><a href="#fn:jouppi2017datacenter" class="footnote">2</a></sup><sup id="fnref:tensorflowbfloat"><a href="#fn:tensorflowbfloat" class="footnote">3</a></sup></p>
+
+<p>Before researchers begin building hardware for their datatype, however, they first need to determine how their datatype will behave numerically in the workloads they care about.
+This often involves first building a software-emulated version of their datatype
+  (e.g. <a href="http://www.jhauser.us/arithmetic/SoftFloat.html" target="_blank">Berkeley SoftFloat</a> or <a href="https://github.com/cjdelisle/libposit" target="_blank">libposit</a>),
+  and then hacking the datatype directly into workloads,
+  to see how the workload performs
+  using the datatype.
+Even better
+  is to integrate the datatype 
+  directly into compilers themselves,
+  so that many different workloads
+  can be compiled
+  to use the datatype.
+Both routes can be tedious, with the latter route often becoming unmanageable given the size and complexity of modern compilers.
+<a href="https://github.com/xman/tensorflow" target="_blank">One example taken from GitHub</a> shows someone hacking the <em>posit</em> datatype into TensorFlow.
+The result is 237 commits, adding nearly 6000 lines of code and touching over 200 files across the codebase—and that’s just to add one datatype!
+This amount of work is prohibitive for many researchers.</p>
+
+<p>To address these problems, we present the Bring Your Own Datatypes framework.
+The framework enables easy exploration of new datatypes in deep learning workloads by allowing users to plug their simulated datatype into TVM.
+Unlike the posits-in-Tensorflow example above, which enables a single new datatype in a compiler, the Bring Your Own Datatype framework enables a huge variety of user-defined types.</p>
+
+<h2 id="bring-your-own-datatypes">Bring Your Own Datatypes</h2>
+
+<p>The goal of the Bring Your Own Datatypes framework
+  is to enable users to run deep learning workloads
+  using custom datatypes.
+In the Bring Your Own Datatypes framework,
+  “datatype” means a scalar type:
+  <code class="highlighter-rouge">float32</code>
+  or <code class="highlighter-rouge">uint8</code>, for example.
+We do not handle more complicated data formats
+  such as <a href="https://en.wikipedia.org/wiki/Block_floating_point" target="_blank">block floating point</a>
+  or Intel’s <a href="https://arxiv.org/abs/1711.02213" target="_blank">Flexpoint</a>.
+Additionally,
+  we only claim to support
+  <em>software emulated</em> versions of these scalar datatypes;
+  we do not explicitly support compiling and running on custom datatype hardware.</p>
+
+<p>Each tensor in TVM
+  is assigned a type code,
+  which defines the datatype of the scalars
+  within the tensor.
+A number of these type codes
+  have hard-coded meanings in TVM,
+  mapping to common datatypes
+  such as <code class="highlighter-rouge">int</code> and <code class="highlighter-rouge">float</code>.
+However,
+  the vast majority of type codes
+  are unused.
+The Bring Your Own Datatypes framework
+  allows users to 
+  claim these unused type codes
+  and add their own new datatypes
+  at runtime.</p>
+
+<p>The framework is implemented as
+  a registry 
+  which sits alongside
+  TVM’s normal datatype facilities.
+There are two primary ways
+  in which the user interacts with
+  the datatype registry:
+  first, <strong>datatype registration,</strong>
+  and second, <strong>lowering function registration.</strong>
+These steps are akin to
+  <em>declaration</em> and <em>implementation</em> of the datatype,
+  respectively.</p>
+
+<h3 id="datatype-registration">Datatype Registration</h3>
+
+<p>To register the datatype,
+  the user assigns the datatype
+  a name and a type code,
+  where the type code comes from
+  the range of unused type codes
+  available to custom datatypes.</p>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">tvm</span><span class="o">.</span><span class="n">datatype</span><span class="o">.</span><span class="n">register</span><span class="p">(</span><span class="s">'bfloat'</span><span class="p">,</span> <span class="mi">150</span><span class="p">)</span>
+</code></pre>
+</div>
+<p>The above code registers
+  the <code class="highlighter-rouge">'bfloat'</code> datatype
+  with type code 150.
+This registration step
+  allows TVM to parse programs
+  which use the custom type:</p>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s">'x'</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span cla [...]
+<span class="n">y</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s">'y'</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'float32'</span><span class="p">)</span>
+<span class="n">x_bfloat</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'custom[bfloat]16'</span><span class="p">)</span>
+<span class="n">y_bfloat</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'custom[bfloat]16'</span><span class="p">)</span>
+<span class="n">z_bfloat</span> <span class="o">=</span> <span class="n">x_bfloat</span> <span class="o">+</span> <span class="n">y_bfloat</span>
+<span class="n">z</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">z_bfloat</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">'float32'</span><span class="p">)</span>
+<span class="n">program</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">Function</span><span class="p">([</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">],</span> <span class="n">z</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="n">program</span><span class="p">)</span>
+
+<span class="c"># v0.0.4</span>
+<span class="c"># fn (%x: Tensor[(3), float32], %y: Tensor[(3), float32]) {</span>
+<span class="c">#   %0 = cast(%x, dtype="custom[bfloat]16");</span>
+<span class="c">#   %1 = cast(%y, dtype="custom[bfloat]16");</span>
+<span class="c">#   %2 = add(%0, %1);</span>
+<span class="c">#   cast(%2, dtype="float32")</span>
+<span class="c"># }</span>
+</code></pre>
+</div>
+<p>The program above
+  casts <code class="highlighter-rouge">float32</code> inputs <code class="highlighter-rouge">x</code> and <code class="highlighter-rouge">y</code>
+  into <code class="highlighter-rouge">bfloat</code>s,
+  adds them,
+  and casts the result back to <code class="highlighter-rouge">float32</code>.
+Once the <code class="highlighter-rouge">bfloat</code> type is registered,
+  TVM is able to parse the special <code class="highlighter-rouge">dtype</code> syntax
+  <code class="highlighter-rouge">custom[&lt;typename&gt;]</code>,
+  where <code class="highlighter-rouge">&lt;typename&gt;</code> is the name registered for the type.
+This syntax also supports the usual
+  <code class="highlighter-rouge">&lt;bits&gt;x&lt;lanes&gt;</code> format;
+  here, we use <code class="highlighter-rouge">16</code> to indicate that
+  each <code class="highlighter-rouge">bfloat</code> is 16 bits wide.
+(The number of lanes
+  defaults to 1.)</p>
+
+<h3 id="lowering-function-registration">Lowering Function Registration</h3>
+
+<p>Though TVM can parse the above program,
+  it cannot yet compile it,
+  as TVM does not yet understand 
+  how to compile operations 
+  over the <code class="highlighter-rouge">bfloat</code> type.
+To compile these programs,
+  we register <em>lowering functions</em> for the custom datatype,
+  which help TVM convert the operations
+  into something it can understand and compile.</p>
+
+<p>Generally, the user is not expected to 
+  lower operations
+  directly to LLVM or CUDA.
+Instead, most code using custom datatypes
+  can be lowered into code which <em>doesn’t</em> use custom datatypes,
+  with some simple tricks.
+We can then rely on native TVM
+  to understand and compile the code.</p>
+
+<p style="text-align: center"><img src="/images/bring-your-own-datatypes/lowering.png" alt="A lowering function lowering an add over `bfloat`s to a library call over `uint16_t`s" width="50%" /></p>
+<center>
+Figure 1: The expected result of a user's registered lowering function. A lowering function should convert a program using custom datatypes to a program which native TVM can understand and compile (in this case, a call to an external library, taking two <tt>uint16_t</tt>s).
+</center>
+<p></p>
+
+<p>Figure 1 shows a common pattern.
+Let’s assume we are
+  interested in exploring the <code class="highlighter-rouge">bfloat</code> type,
+  and have chosen to run some workloads
+  by plugging a <code class="highlighter-rouge">bfloat</code> emulation library (e.g. <a href="https://github.com/biovault/biovault_bfloat16" target="_blank">biovault_bfloat16</a>) into TVM
+  via the Bring Your Own Datatypes framework.
+Our workload is a simple program
+  which adds two <code class="highlighter-rouge">bfloat</code> inputs.
+Native TVM does not understand
+  how to implement <code class="highlighter-rouge">bfloat</code> addition—but it doesn’t need to,
+  as we have a library implementing our datatype!
+The library contains an implementation of <code class="highlighter-rouge">bfloat</code> addition,
+  alongside other operators such as multiplication and square root.
+To implement this <code class="highlighter-rouge">bfloat</code> addition,
+  we’d just like to call into our library.
+Thus, our Add node should become a Call node,
+  calling out to a function (call it <code class="highlighter-rouge">BFloat16Add</code>) in our library.
+To store the bits of the input <code class="highlighter-rouge">bfloat</code>s
+  inside a type that TVM understands,
+  we use 16-bit unsigned integers.
+The resulting program 
+  is one that TVM can understand and compile—it
+  is simply a call to an external library function,
+  taking two unsigned integers.</p>
+
+<p>To achieve the above lowering,
+  we register a lowering function
+  for <code class="highlighter-rouge">bfloat</code>:</p>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">tvm</span><span class="o">.</span><span class="n">datatype</span><span class="o">.</span><span class="n">register_op</span><span class="p">(</span>
+    <span class="n">tvm</span><span class="o">.</span><span class="n">datatype</span><span class="o">.</span><span class="n">create_lower_func</span><span class="p">(</span><span class="s">'BFloat16Add'</span><span class="p">),</span>
+    <span class="s">'Add'</span><span class="p">,</span> <span class="s">'llvm'</span><span class="p">,</span> <span class="s">'bfloat'</span><span class="p">)</span>
+</code></pre>
+</div>
+<p>The above code registers
+  a lowering function
+  for a specific operator (Add),
+  compilation target (LLVM),
+  and datatype (<code class="highlighter-rouge">bfloat</code>).
+The first argument
+  is the lowering function.
+This can be any function
+  taking a TVM IR node
+  and returning a new TVM IR node.
+In our case,
+  we use a helper function
+  provided by the Bring Your Own Datatypes framework.
+<code class="highlighter-rouge">tvm.datatype.create_lower_func('BFloat16Add')</code>
+  creates a lowering function
+  for the common pattern described above.
+The resulting function
+  converts the arguments of the given node
+  to <code class="highlighter-rouge">uint16_t</code>,
+  and then converts the node itself
+  into a call to the given function name
+  (in this case, <code class="highlighter-rouge">'BFloat16Add'</code>).</p>
+
+<p>To implement a custom datatype,
+  the user will need to register
+  a lowering function for every operator
+  in the workload they would like to run.
+For a network like ResNet,
+  this will be around 10 operators,
+  including things like, Add, Div, various Casts, and Max.
+In our tests,
+  registering a datatype
+  and all lowering functions
+  takes around 40 lines of Python.
+Once all needed operators
+  are registered,
+  custom datatype workloads
+  can be run
+  as easily as
+  any other TVM program!</p>
+
+<h1 id="wrapping-up">Wrapping Up</h1>
+
+<p>The Bring Your Own Datatypes framework
+  brings user-defined datatypes to TVM.
+We hope this will encourage datatype researchers
+  to use TVM in their research;
+  similarly,
+  we hope this will spark interest
+  in custom datatypes
+  within the deep learning community.
+The Bring Your Own Datatypes framework
+  partially exists in TVM at the moment,
+  and more will be merged in (including full documentation)
+  in the coming months.</p>
+
+<hr />
+
+<p><em>Gus Smith is a PhD student at the University of Washington working with Luis Ceze and Zachary Tatlock at the intersection of computer architecture and programming languages. His website is <a href="https://justg.us" target="_blank">justg.us</a>.</em></p>
+
+<h2 id="references">References</h2>
+
+<div class="footnotes">
+  <ol>
+    <li id="fn:ieee">
+      <p><a href="https://standards.ieee.org/standard/754-2019.html" target="_blank">754-2019 - IEEE Standard for Floating-Point Arithmetic</a>&nbsp;<a href="#fnref:ieee" class="reversefootnote">&#8617;</a></p>
+    </li>
+    <li id="fn:jouppi2017datacenter">
+      <p>Jouppi, Norman P., et al. “In-datacenter performance analysis of a tensor processing unit.” Proceedings of the 44th Annual International Symposium on Computer Architecture. 2017.&nbsp;<a href="#fnref:jouppi2017datacenter" class="reversefootnote">&#8617;</a></p>
+    </li>
+    <li id="fn:tensorflowbfloat">
+      <p><a href="https://cloud.google.com/tpu/docs/bfloat16" target="_blank">Using bfloat16 with TensorFlow models</a>&nbsp;<a href="#fnref:tensorflowbfloat" class="reversefootnote">&#8617;</a></p>
+    </li>
+  </ol>
+</div>
+
+    </div>
+  </div>
+</div>
+</div>
+
+
+    
+
+
+
+
+
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
+
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
+
+      </footer>
+    </div>
+  </body>
+</html>
+
+
diff --git a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
index 08c5e67..1d2252b 100644
--- a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
+++ b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>TinyML - How TVM is Taming Tiny</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>TinyML - How TVM is Taming Tiny </h1>
       <p class="post-meta">
-        <time datetime="2020-06-04T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2020-06-04T00:00:00-07:00" itemprop="datePublished">
           Jun 4, 2020
         </time>
         
@@ -170,35 +180,38 @@ A standard µTVM setup, where the host communicates with the device via JTAG.</p
 
 <p>Above, we have an <a href="https://www.st.com/en/microcontrollers-microprocessors/stm32f746zg.html">STM32F746ZG board</a>, housing an ARM Cortex-M7 processor, an ideal part for AI on the edge given it’s strong performance in a low power envelope. We use its USB-JTAG port to connect it to our desktop machine.  On the desktop, we run OpenOCD to open a JTAG connection with the device; in turn, OpenOCD allows µTVM to control the M7 processor using a device-agnostic TCP socket.  With this  [...]
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">OPENOCD_SERVER_ADDR</span> <span class="o">=</span> <span class="s">'127.0.0.1'</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">OPENOCD_SERVER_ADDR</span> <span class="o">=</span> <span class="s">'127.0.0.1'</span>
 <span class="n">OPENOCD_SERVER_PORT</span> <span class="o">=</span> <span class="mi">6666</span>
-<span class="n">TARGET</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">target</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="s">'c -device=micro_dev'</span><span class="p">)</span>
-<span class="n">DEV_CONFIG</span> <span class="o">=</span> <span class="n">stm32f746xx</span><span class="p">.</span><span class="n">default_config</span><span class="p">(</span><span class="n">OPENOCD_SERVER_ADDR</span><span class="p">,</span> <span class="n">OPENOCD_SERVER_PORT</span><span class="p">)</span>
+<span class="n">TARGET</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">target</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="s">'c -device=micro_dev'</span><span class="p">)</span>
+<span class="n">DEV_CONFIG</span> <span class="o">=</span> <span class="n">stm32f746xx</span><span class="o">.</span><span class="n">default_config</span><span class="p">(</span><span class="n">OPENOCD_SERVER_ADDR</span><span class="p">,</span> <span class="n">OPENOCD_SERVER_PORT</span><span class="p">)</span>
 
 <span class="n">module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">get_cifar10_cnn</span><span class="p">()</span>
-<span class="k">with</span> <span class="n">micro</span><span class="p">.</span><span class="n">Session</span><span class="p">(</span><span class="n">device_config</span><span class="p">)</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
-	<span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class="o">=</span><span class="n">TARGET</span><span class="p">,</span> <span cl [...]
-  <span class="n">micro_mod</span> <span class="o">=</span> <span class="n">micro</span><span class="p">.</span><span class="n">create_micro_mod</span><span class="p">(</span><span class="n">c_module</span><span class="p">,</span> <span class="n">DEV_CONFIG</span><span class="p">)</span>
-  <span class="n">graph_mod</span> <span class="o">=</span> <span class="n">graph_runtime</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">micro_mod</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">tvm</span><span class="p">.</span><span class="n">micro_dev</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
-  <span class="n">graph_mod</span><span class="p">.</span><span class="n">run</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data_np</span><span class="p">)</span>
-  <span class="n">prediction</span> <span class="o">=</span> <span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">graph_mod</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">asnumpy</span><span class="p">())]</span>
-  <span class="k">print</span><span class="p">(</span><span class="s">f'prediction was </span><span class="si">{</span><span class="n">prediction</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="k">with</span> <span class="n">micro</span><span class="o">.</span><span class="n">Session</span><span class="p">(</span><span class="n">device_config</span><span class="p">)</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
+	<span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class="o">=</span><span class="n">TARGET</span><span class="p">,</span> <span cl [...]
+  <span class="n">micro_mod</span> <span class="o">=</span> <span class="n">micro</span><span class="o">.</span><span class="n">create_micro_mod</span><span class="p">(</span><span class="n">c_module</span><span class="p">,</span> <span class="n">DEV_CONFIG</span><span class="p">)</span>
+  <span class="n">graph_mod</span> <span class="o">=</span> <span class="n">graph_runtime</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">micro_mod</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">tvm</span><span class="o">.</span><span class="n">micro_dev</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
+  <span class="n">graph_mod</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">data_np</span><span class="p">)</span>
+  <span class="n">prediction</span> <span class="o">=</span> <span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">graph_mod</span><span class="o">.</span><span class="n">get_output</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">())]</span>
+  <span class="k">print</span><span class="p">(</span><span class="n">f</span><span class="s">'prediction was {prediction}'</span><span class="p">)</span>
+</code></pre>
+</div>
 
-<p>Below are the performance results of MicroTVM, compared with <a href="https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0">CMSIS-NN version 5.7.0</a> (commit <code class="language-plaintext highlighter-rouge">a65b7c9a</code>), a hand-optimized library of ML kernels.</p>
+<p>Below are the performance results of MicroTVM, compared with <a href="https://github.com/ARM-software/CMSIS_5/releases/tag/5.6.0">CMSIS-NN version 5.7.0</a> (commit <code class="highlighter-rouge">a65b7c9a</code>), a hand-optimized library of ML kernels.</p>
 
 <p style="text-align: center"><img src="/images/microtvm/post-2020-05-28/cifar10-int-8-cnn.png" alt="/images/microtvm/post-2020-05-28/cifar10-int-8-cnn.png" width="60%" /><br /></p>
 
 <p>As we can see, the out-of-the-box performance isn’t great, but this is where <a href="https://dl.acm.org/doi/10.5555/3327144.3327258">AutoTVM</a> comes to the rescue.  We can write a schedule template for our device, do a round of autotuning, then achieve significantly better results.  To plug in our autotuned results, we only need to replace this line:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">t [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class [...]
+</code></pre>
+</div>
 
 <p>with these lines:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">TARGET</span><span class="p">,</span> <span class="n">autotvm</span><span class="p">.</span><span class="n">apply_history_best</span><span class="p">(</span><span class="n">TUNING_RESULTS_FILE</span><span class="p">):</span>
-  <span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class="o">=</span><span class="n">TARGET</span><span class="p">,</span> <span c [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">with</span> <span class="n">TARGET</span><span class="p">,</span> <span class="n">autotvm</span><span class="o">.</span><span class="n">apply_history_best</span><span class="p">(</span><span class="n">TUNING_RESULTS_FILE</span><span class="p">):</span>
+  <span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class="o">=</span><span class="n">TARGET</span><span class="p">,</span> <span c [...]
+</code></pre>
+</div>
 
 <p>And our results now look like this:</p>
 
@@ -222,25 +235,26 @@ The µTVM Device Memory Layout in RAM</p>
 
 <p>Most bare-metal devices have support for C and JTAG (a debugging protocol), so (1) and (2) usually come for free!  Furthermore, (3) and (4) are often very small asks.  Below are examples of (3) and (4) for STM32F746-series boards.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">device_config</span> <span class="o">=</span> <span class="p">{</span>
-    <span class="s">'device_id'</span><span class="p">:</span> <span class="s">'arm.stm32f746xx'</span><span class="p">,</span>        <span class="c1"># unique identifier for the device
-</span>    <span class="s">'toolchain_prefix'</span><span class="p">:</span> <span class="s">'arm-none-eabi-'</span><span class="p">,</span>  <span class="c1"># prefix of each binary in the cross-compilation toolchain (e.g., arm-none-eabi-gcc)
-</span>    <span class="s">'base_addr'</span><span class="p">:</span> <span class="mh">0x20000000</span><span class="p">,</span>               <span class="c1"># first address of RAM
-</span>    <span class="s">'section_sizes'</span><span class="p">:</span> <span class="p">{</span>                     <span class="c1"># dictionary of desired section sizes in bytes
-</span>         <span class="s">'text'</span><span class="p">:</span> <span class="mi">18000</span><span class="p">,</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">device_config</span> <span class="o">=</span> <span class="p">{</span>
+    <span class="s">'device_id'</span><span class="p">:</span> <span class="s">'arm.stm32f746xx'</span><span class="p">,</span>        <span class="c"># unique identifier for the device</span>
+    <span class="s">'toolchain_prefix'</span><span class="p">:</span> <span class="s">'arm-none-eabi-'</span><span class="p">,</span>  <span class="c"># prefix of each binary in the cross-compilation toolchain (e.g., arm-none-eabi-gcc)</span>
+    <span class="s">'base_addr'</span><span class="p">:</span> <span class="mh">0x20000000</span><span class="p">,</span>               <span class="c"># first address of RAM</span>
+    <span class="s">'section_sizes'</span><span class="p">:</span> <span class="p">{</span>                     <span class="c"># dictionary of desired section sizes in bytes</span>
+         <span class="s">'text'</span><span class="p">:</span> <span class="mi">18000</span><span class="p">,</span>
          <span class="s">'rodata'</span><span class="p">:</span> <span class="mi">100</span><span class="p">,</span>
          <span class="s">'data'</span><span class="p">:</span> <span class="mi">100</span><span class="p">,</span>
-         <span class="p">...</span>
+         <span class="o">...</span>
     <span class="p">},</span>
-    <span class="s">'word_size'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span>                        <span class="c1"># device word size
-</span>    <span class="s">'thumb_mode'</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>                    <span class="c1"># whether to use ARM's thumb ISA
-</span>    <span class="s">'comms_method'</span><span class="p">:</span> <span class="s">'openocd'</span><span class="p">,</span>             <span class="c1"># method of communication with the device
-</span>    <span class="s">'server_addr'</span><span class="p">:</span> <span class="s">'127.0.0.1'</span><span class="p">,</span>            <span class="c1"># OpenOCD server address (if 'comms_method' is 'openocd')
-</span>    <span class="s">'server_port'</span><span class="p">:</span> <span class="mi">6666</span><span class="p">,</span>                   <span class="c1"># OpenOCD server port (if 'comms_method' is 'openocd')
-</span><span class="p">}</span>
-</code></pre></div></div>
-
-<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">.</span><span class="n">syntax</span> <span class="n">unified</span>
+    <span class="s">'word_size'</span><span class="p">:</span> <span class="mi">4</span><span class="p">,</span>                        <span class="c"># device word size</span>
+    <span class="s">'thumb_mode'</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>                    <span class="c"># whether to use ARM's thumb ISA</span>
+    <span class="s">'comms_method'</span><span class="p">:</span> <span class="s">'openocd'</span><span class="p">,</span>             <span class="c"># method of communication with the device</span>
+    <span class="s">'server_addr'</span><span class="p">:</span> <span class="s">'127.0.0.1'</span><span class="p">,</span>            <span class="c"># OpenOCD server address (if 'comms_method' is 'openocd')</span>
+    <span class="s">'server_port'</span><span class="p">:</span> <span class="mi">6666</span><span class="p">,</span>                   <span class="c"># OpenOCD server port (if 'comms_method' is 'openocd')</span>
+<span class="p">}</span>
+</code></pre>
+</div>
+
+<div class="language-cpp highlighter-rouge"><pre class="highlight"><code><span class="p">.</span><span class="n">syntax</span> <span class="n">unified</span>
 <span class="p">.</span><span class="n">cpu</span> <span class="n">cortex</span><span class="o">-</span><span class="n">m7</span>
 <span class="p">.</span><span class="n">fpu</span> <span class="n">softvfp</span>
 <span class="p">.</span><span class="n">thumb</span>
@@ -260,22 +274,24 @@ The µTVM Device Memory Layout in RAM</p>
   <span class="n">ldr</span> <span class="n">sp</span><span class="p">,</span> <span class="o">=</span><span class="n">_utvm_stack_pointer_init</span>
   <span class="n">bl</span> <span class="n">UTVMMain</span>
 <span class="p">.</span><span class="n">size</span> <span class="n">UTVMInit</span><span class="p">,</span> <span class="p">.</span><span class="o">-</span><span class="n">UTVMInit</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>The µTVM infrastructure and device runtime have been built to only make use of these requirements, and we’re working to lessen these requirements by supporting common open source runtime platforms such as mBED OS to handle the compilation and linking processes.</p>
 
 <h2 id="device-sessions">Device Sessions</h2>
 
-<p>Given the networked nature of microcontroller interaction, we slightly deviate from standard TVM code by introducing the concept of <code class="language-plaintext highlighter-rouge">MicroSession</code>.</p>
+<p>Given the networked nature of microcontroller interaction, we slightly deviate from standard TVM code by introducing the concept of <code class="highlighter-rouge">MicroSession</code>.</p>
 
 <p>Every piece of functionality in µTVM relies on having an open session with the target device.  If you’re familiar with TVM, you may have noticed a line of code that deviates from the norm in our first code snippet—-namely, this one:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">...</span>
-<span class="k">with</span> <span class="n">micro</span><span class="p">.</span><span class="n">Session</span><span class="p">(</span><span class="n">device_config</span><span class="p">)</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
-	<span class="p">...</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="o">...</span>
+<span class="k">with</span> <span class="n">micro</span><span class="o">.</span><span class="n">Session</span><span class="p">(</span><span class="n">device_config</span><span class="p">)</span> <span class="k">as</span> <span class="n">sess</span><span class="p">:</span>
+	<span class="o">...</span>
+</code></pre>
+</div>
 
-<p>Every line inside this <code class="language-plaintext highlighter-rouge">with</code> block can call functions in µTVM, with the context being the device specified by <code class="language-plaintext highlighter-rouge">device_config</code>.  This line is doing a number of things under the hood, so let’s unpack it.</p>
+<p>Every line inside this <code class="highlighter-rouge">with</code> block can call functions in µTVM, with the context being the device specified by <code class="highlighter-rouge">device_config</code>.  This line is doing a number of things under the hood, so let’s unpack it.</p>
 
 <p>First, it initializes a connection with your device, using whichever communication method you specified (usually OpenOCD).  The µTVM device runtime is then cross-compiled, using whichever cross-compiler you specified.  Finally, space for the compiled binary is allocated by the host, and the binary is loaded onto the device using the opened connection.</p>
 
@@ -285,34 +301,38 @@ The µTVM Device Memory Layout in RAM</p>
 
 <p>One of the core abstractions in TVM is that of a module.  A module stores a set of related functions for a particular device/runtime target.  Given that microcontrollers don’t normally have operating systems, µTVM needs to do a lot of extra work to maintain this high-level abstraction.  To see what’s going on, we’ll trace through the process of creating and loading a µTVM-compatible module.</p>
 
-<p>Suppose we have a <code class="language-plaintext highlighter-rouge">micro.Session</code> open with our device and a TVM schedule that implements 2D convolution.  If we want to load it onto our microcontroller, we need it to emit C code.  To do so, we just need to set the <code class="language-plaintext highlighter-rouge">target</code> in either <code class="language-plaintext highlighter-rouge">tvm.build</code> or <code class="language-plaintext highlighter-rouge">relay.build</code>. [...]
+<p>Suppose we have a <code class="highlighter-rouge">micro.Session</code> open with our device and a TVM schedule that implements 2D convolution.  If we want to load it onto our microcontroller, we need it to emit C code.  To do so, we just need to set the <code class="highlighter-rouge">target</code> in either <code class="highlighter-rouge">tvm.build</code> or <code class="highlighter-rouge">relay.build</code>.  Example:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">t [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">graph</span><span class="p">,</span> <span class="n">c_module</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">module</span><span class="p">[</span><span class="s">'main'</span><span class="p">],</span> <span class="n">target</span><span class [...]
+</code></pre>
+</div>
 
-<p>By setting the target like so, the build process runs through our C code generation backend.  However, the resulting C module still resides on the host machine.  In order to load it onto the device, we run it through one of the core functions in the µTVM infrastructure: <code class="language-plaintext highlighter-rouge">create_micro_mod</code>.  Example:</p>
+<p>By setting the target like so, the build process runs through our C code generation backend.  However, the resulting C module still resides on the host machine.  In order to load it onto the device, we run it through one of the core functions in the µTVM infrastructure: <code class="highlighter-rouge">create_micro_mod</code>.  Example:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">micro_mod</span> <span class="o">=</span> <span class="n">micro</span><span class="p">.</span><span class="n">create_micro_mod</span><span class="p">(</span><span class="n">c_module</span><span class="p">,</span> <span class="n">DEV_CONFIG</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">micro_mod</span> <span class="o">=</span> <span class="n">micro</span><span class="o">.</span><span class="n">create_micro_mod</span><span class="p">(</span><span class="n">c_module</span><span class="p">,</span> <span class="n">DEV_CONFIG</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>The line above cross-compiles the C source within the module, allocates room for the resulting binary (so it can coexist with the runtime in device memory), then sends each section of the binary to its allocated slot on the device.  Once the module binary is snug in device memory, function pointers within the binary are patched to give the module access to helper functions in the device runtime (e.g., for allocating scratchpads).</p>
 
 <p>Now, with our kernel loaded on the device, we can grab a remote handle to the convolution function like so:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">micro_func</span> <span class="o">=</span> <span class="n">micro_mod</span><span class="p">[</span><span class="s">'conv2d'</span><span class="p">]</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">micro_func</span> <span class="o">=</span> <span class="n">micro_mod</span><span class="p">[</span><span class="s">'conv2d'</span><span class="p">]</span>
+</code></pre>
+</div>
 
 <h2 id="tensor-loading">Tensor Loading</h2>
 
 <p>If we want to call an operator, we first need some tensors as arguments:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">data_np</span><span class="p">,</span> <span class="n">kernel_np</span> <span class="o">=</span> <span class="n">get_conv_inputs</span><span class="p">()</span>
-<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">micro_dev</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">data_np</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
-<span class="n">kernel</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">kernel_np</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">data_np</span><span class="p">,</span> <span class="n">kernel_np</span> <span class="o">=</span> <span class="n">get_conv_inputs</span><span class="p">()</span>
+<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">micro_dev</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">data_np</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+<span class="n">kernel</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">kernel_np</span><span class="p">,</span> <span class="n">ctx</span><span class="o">=</span><span class="n">ctx</span><span class="p">)</span>
+</code></pre>
+</div>
 
-<p>Based on its data type (e.g., <code class="language-plaintext highlighter-rouge">int8</code>, <code class="language-plaintext highlighter-rouge">float32</code>, etc.) and shape, each tensor’s size in bytes is calculated, and the host allocates a region of memory on the device’s heap.  The tensor’s data is then loaded into the allocated region.</p>
+<p>Based on its data type (e.g., <code class="highlighter-rouge">int8</code>, <code class="highlighter-rouge">float32</code>, etc.) and shape, each tensor’s size in bytes is calculated, and the host allocates a region of memory on the device’s heap.  The tensor’s data is then loaded into the allocated region.</p>
 
 <h2 id="function-calls">Function Calls</h2>
 
@@ -322,12 +342,13 @@ The µTVM Device Memory Layout in RAM</p>
 
 <p>When calling a function, both input and output tensors are passed as arguments, in what’s known as destination-passing style:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">conv2D</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">output</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">conv2D</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">kernel</span><span class="p">,</span> <span class="n">output</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Given that these tensors are already allocated on the device, we only need to send <em>metadata</em> to the device (device address, shape, and data type), so it knows which of its resident tensors to use.  The runtime representation of a function call includes this metadata, as well as the address of the function being called (shown below).  Before constructing this representation, the metadata needs to be serialized into the arguments section on the device that exists expressly for t [...]
 
-<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/*
+<div class="language-c highlighter-rouge"><pre class="highlight"><code><span class="cm">/*
  * task struct for uTVM
  */</span>
 <span class="k">typedef</span> <span class="k">struct</span> <span class="p">{</span>
@@ -340,15 +361,16 @@ The µTVM Device Memory Layout in RAM</p>
   <span class="cm">/* number of arguments */</span>
   <span class="kt">int32_t</span> <span class="n">num_args</span><span class="p">;</span>
 <span class="p">}</span> <span class="n">UTVMTask</span><span class="p">;</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>In the strict setting, there is a single global <code class="language-plaintext highlighter-rouge">UTVMTask</code> instance that we, from the host side, write into.  Once we have written to the task, the runtime has everything it needs to execute the function, and we can begin execution at the runtime’s entry point.  The runtime will perform some lightweight initialization, run our operator, then return control to the host.</p>
+<p>In the strict setting, there is a single global <code class="highlighter-rouge">UTVMTask</code> instance that we, from the host side, write into.  Once we have written to the task, the runtime has everything it needs to execute the function, and we can begin execution at the runtime’s entry point.  The runtime will perform some lightweight initialization, run our operator, then return control to the host.</p>
 
 <h3 id="lazy-execution">Lazy Execution</h3>
 
 <p>In practice, executing operators as soon as the user requests to becomes prohibitively expensive, as communication overhead begins to dominate.  We can improve the throughput of our system by delaying evaluation until the user wants the results of the call.</p>
 
-<p>From an implementation standpoint, instead of eagerly serializing argument metadata and <code class="language-plaintext highlighter-rouge">UTVMTask</code> data, we now need to accumulate function call metadata on the host side, before flushing it to the device.  The device runtime also needs a few changes: (1) we must now have a global array of <code class="language-plaintext highlighter-rouge">UTVMTask</code> and (2) we need to loop through and execute each task in order.</p>
+<p>From an implementation standpoint, instead of eagerly serializing argument metadata and <code class="highlighter-rouge">UTVMTask</code> data, we now need to accumulate function call metadata on the host side, before flushing it to the device.  The device runtime also needs a few changes: (1) we must now have a global array of <code class="highlighter-rouge">UTVMTask</code> and (2) we need to loop through and execute each task in order.</p>
 
 <h2 id="autotvm-with-microtvm">AutoTVM with MicroTVM</h2>
 
@@ -387,7 +409,7 @@ Diagram of CIFAR-10 CNN</p>
 
 <h2 id="methodology">Methodology</h2>
 
-<p>In our experiments, we use TVM from HEAD (commit <code class="language-plaintext highlighter-rouge">9fa8341</code>), version 5.7.0 of CMSIS-NN (commit <code class="language-plaintext highlighter-rouge">a65b7c9a</code>), version 1.16.0 of STM32CubeF7, and GCC from Arm’s GNU Tools for Arm Embedded Processors 9-2019-q4-major 9.2.1 toolchain (revision 277599).  The host machine used in our experiments runs Ubuntu Linux 18.04.4 LTS and sports an AMD Ryzen Threadripper 2990WX 32-Core Proces [...]
+<p>In our experiments, we use TVM from HEAD (commit <code class="highlighter-rouge">9fa8341</code>), version 5.7.0 of CMSIS-NN (commit <code class="highlighter-rouge">a65b7c9a</code>), version 1.16.0 of STM32CubeF7, and GCC from Arm’s GNU Tools for Arm Embedded Processors 9-2019-q4-major 9.2.1 toolchain (revision 277599).  The host machine used in our experiments runs Ubuntu Linux 18.04.4 LTS and sports an AMD Ryzen Threadripper 2990WX 32-Core Processor with 62GB of RAM.  All evaluation  [...]
 
 <h3 id="arm-specific-optimizations">Arm-Specific Optimizations</h3>
 
@@ -405,7 +427,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix multiplication microkernel</p>
 <p>There are certainly other optimizations we could pull from CMSIS-NN to close the gap even further:</p>
 
 <ul>
-  <li>Batch expansion of <code class="language-plaintext highlighter-rouge">int8</code> weights into <code class="language-plaintext highlighter-rouge">int16</code>, to cut down on duplicate expansion for SIMD</li>
+  <li>Batch expansion of <code class="highlighter-rouge">int8</code> weights into <code class="highlighter-rouge">int16</code>, to cut down on duplicate expansion for SIMD</li>
   <li>Splitting convolution into 3x3 tiles to reduce padding checks</li>
 </ul>
 
@@ -420,10 +442,10 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix multiplication microkernel</p>
 <p><a href="https://github.com/areusch/microtvm-blogpost-eval">https://github.com/areusch/microtvm-blogpost-eval</a></p>
 
 <p style="text-align: center"><img src="/images/microtvm/post-2020-05-28/autotuned-cifar10-int-8-cnn.png" alt="/images/microtvm/post-2020-05-28/autotuned-cifar10-int-8-cnn.png" width="60%" /><br />
-<code class="language-plaintext highlighter-rouge">int8</code>-quantized CIFAR-10 CNN comparison on an Arm STM32F746NG (re-posted from above)</p>
+<code class="highlighter-rouge">int8</code>-quantized CIFAR-10 CNN comparison on an Arm STM32F746NG (re-posted from above)</p>
 
 <p style="text-align: center"><img src="/images/microtvm/post-2020-05-28/autotuned-cifar10-int-8-cnn-x86.png" alt="/images/microtvm/post-2020-05-28/autotuned-cifar10-int-8-cnn-x86.png" width="60%" /><br />
-<code class="language-plaintext highlighter-rouge">int8</code>-quantized CIFAR-10 CNN comparison on µTVM’s emulated host device</p>
+<code class="highlighter-rouge">int8</code>-quantized CIFAR-10 CNN comparison on µTVM’s emulated host device</p>
 
 <p>On the Arm STM32-series board, we were able to improve performance by ~2x compared to the initial untuned operators, and we achieved results much closer to CMSIS-NN.  Additionally, we were able to significantly improve performance on the host emulated device.  Though the x86 <strong><em>numbers</em></strong> don’t mean much, they show we can use the same infrastructure (µTVM) to optimize performance on vastly different architectures.</p>
 
@@ -459,41 +481,37 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix multiplication microkernel</p>
 </div>
 </div>
 
+
     
 
 
 
 
-  <script src="https://code.jquery.com/jquery-2.2.0.min.js" type="text/javascript"></script>
-  <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js" integrity="sha384-ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script>
-  <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/js/bootstrap.min.js" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>
-  <!-- <script src="./assets/js/slick.js"></script> -->
-  <script src="/assets/js/custome.js"></script>
-  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
-  <script>
-    window.dataLayer = window.dataLayer || [];
-    function gtag(){dataLayer.push(arguments);}
-    gtag('js', new Date());
-    gtag('config', 'UA-75982049-2');
-  </script>
-</body>
-<section class="footerSec">
-  <div class="footerHeader">
-    <ul class="container d-flex align-md-items-center justify-content-between flex-column flex-md-row">
-      <li class="logo">
-
-        <p><a href="/"><img src="/assets/images/logo.svg" alt="logo" title="logo" /></a></p>
-      </li>
-      <li class="copywrite d-flex align-items-center">
-        <h5 id="apache-software-foundation--all-right-reserved">© 2020 Apache Software Foundation | All right reserved</h5>
-      </li>
-    </ul>
 
-  </div>
+    <div class="container">
+
+      <footer class="small">
+        Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF),
+        sponsored by the <i>Apache Incubator</i>. Incubation is required
+        of all newly accepted projects until a further review indicates that the infrastructure,
+        communications, and decision making process have stabilized in a manner consistent with other
+        successful ASF projects. While incubation status is not necessarily a reflection of the completeness
+        or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
+
+        Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache,
+        the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.
 
-  <ul class="container">
-    <li class="footernote">Apache TVM is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does in [...]
-  </ul>
+        See also other useful <a href="/asf" class="footer-link">ASF links</a>:
+        <a href="https://www.apache.org/" class="footer-link">Apache Homepage</a>,
+        <a href="https://www.apache.org/licenses/" class="footer-link">License</a>
+        <a href="https://www.apache.org/foundation/sponsorship.html" class="footer-link">Sponsorship</a>,
+        <a href="https://www.apache.org/security/" class="footer-link">Security</a>
+        <a href="https://www.apache.org/foundation/thanks.html" class="footer-link">Thanks</a>,
+        <a href="https://www.apache.org/events/current-event.html" class="footer-link">Current Event</a>
 
-</section>
+      </footer>
+    </div>
+  </body>
 </html>
+
+
diff --git a/2020/07/14/bert-pytorch-tvm.html b/2020/07/14/bert-pytorch-tvm.html
index 43cc791..ded7049 100644
--- a/2020/07/14/bert-pytorch-tvm.html
+++ b/2020/07/14/bert-pytorch-tvm.html
@@ -1,146 +1,156 @@
+
+<!DOCTYPE html>
 <html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <head>
+    <meta charset="utf-8">
     <title>Bridging PyTorch and TVM</title>
-    <link rel="shortcut icon" href="/assets/images/favicon.ico">
-    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
-    <link rel="stylesheet" href="/css/slick.css">
-    <link rel="stylesheet" href="/css/slick-theme.css">
-    <link rel="stylesheet" href="/css/custom.css">
+    
+    <meta name="author" content="">
+
+    <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->
+    <!--[if lt IE 9]>
+      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
+    <![endif]-->
+
+    <!-- Le styles -->
+    <link href="/assets/themes/custom-twitter/css/1.4.0/bootstrap.css" rel="stylesheet">
+    <link href="/assets/themes/custom-twitter/css/style.css?body=1" rel="stylesheet" type="text/css" media="all">
+
+    <!-- Le fav and touch icons -->
+  <!-- Update these with your own images
+    <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  <link rel="shortcut icon" href="images/logo/tvm-logo.png">
+  -->
+  <link href="/images/logo/tvm-logo-square.png" rel="icon" type="image/png"/>
+  <!-- Global site tag (gtag.js) - Google Analytics -->
+  <script async src="https://www.googletagmanager.com/gtag/js?id=UA-75982049-2"></script>
+  <script>
+    window.dataLayer = window.dataLayer || [];
+    function gtag(){dataLayer.push(arguments);}
+
+    gtag('js', new Date());
+    gtag('config', 'UA-75982049-2');
+  </script>
+
 </head>
-<body>
 
+  <body>
+    <div class="topbar">
+      <div class="fill">
+        <div class="container">
+          <h2 id="logo-wrap">
+            <a href="/" class="nav">
+              <img src="/images/logo/tvm-logo-small-black.png" width="100px">
+            </a>
+          </h2>
+          <ul class="nav" id="nav-bar">
+            
+            
+            
+
+
+
+  
     
-<div class="bannerPage">
-      <header class="header">
-      <div class="container">
-        <div class="headerInner d-flex justify-content-between align-items-center">
-          <div class="headerLogo">
-            <a href="/"><img src="/assets/images/logo.svg" alt="Logo"></a>
-          </div>
-          <div id="headMenu" class="headerNav">
-            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="/assets/images/close-icon.svg"
-                alt="Close"></button>
-                <ul class="nav">
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/community">Community</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/download">Download</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="/vta">VTA</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="/blog">Blog</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvm.apache.org/docs/">Docs</a>
-    </li>
+  
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://tvmconf.org/">Conference</a>
-    </li>
+      
+      
     
-    <li class="nav-item">
-        <a class="nav-link" href="https://github.com/apache/incubator-tvm/">Github</a>
-    </li>
+  
     
-</ul>
-            <div class="responsiveasfdropdown">
-              <button type="button" class="btn-link">
-                ASF
-              </button>
-              <ul>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+      
+      	
+      	<li><a href="/community">Community</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+      
+      	
+      	<li><a href="/download">Download</a></li>
+      	
+      
+      
     
-</ul>
-            </div>
-          </div>
-          <div class="responsiveMenuIcon">
-            <button type="button" id="menuBtn" class="btn-menu"><img src="/assets/images/menu-icon.svg"
-                alt="Menu Icon" /></button>
-          </div>
-          <div class="asfDropdown">
-            <div class="dropdown">
-              <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true"
-                aria-expanded="false">
-                ASF
-              </button>
-              <div class="dropdown-menu dropdown-menu-right">
-                <ul>
+  
     
-    <li>
-        <a href="https://www.apache.org/">Apache Homepage</a>
-    </li>
+      
+      	
+      	<li><a href="/about">About</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/licenses/">License</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/sponsorship.html">Sponsorship</a>
-    </li>
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/security/">Security</a>
-    </li>
+  
     
-    <li>
-        <a href="https://www.apache.org/foundation/thanks.html">Thanks</a>
-    </li>
+      
+      	
+      	<li><a href="/vta">VTA</a></li>
+      	
+      
+      
     
-    <li>
-        <a href="https://www.apache.org/events/current-event">Events</a>
-    </li>
+  
     
-</ul>
-              </div>
-            </div>
-          </div>
-        </div>
-      </div>
-    </header>
+      
+      
+      	
+      	<li><a href="/blog">Blog</a></li>
+      	
+      
+    
+  
 
-</div>
 
 
+
+            <li> <a href="https://tvm.apache.org/docs">Docs</a></li>
+            <li> <a href="https://tvmconf.org">TVM Conference</a></li>
+            <li> <a href="https://github.com/apache/incubator-tvm/">Github</a></li>
+            <li> <a href="/asf">ASF</a></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+    
 <div class="container">
 <div class="content">
   <div class="row">
-    <div class="span14 w-100">
+    <div class="span14">
       <h1>Bridging PyTorch and TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-07-14T00:00:00-04:00" itemprop="datePublished">
+        <time datetime="2020-07-14T00:00:00-07:00" itemprop="datePublished">
           Jul 14, 2020
         </time>
         
@@ -181,56 +191,62 @@ specifically the part until we have a traced model.</p>
 <p>The PyTorch traced model takes around 0.65-0.7 seconds for 100 runs on my AMD Radeon VII with the example inputs, which means 6.5-7ms per run.
 We can try to see if we can use TVM get faster. Let converting our model to TVM is a breeze:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">shape_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">i</span><span class="p">.</span><span class="n">debugName</span><span class="p">().</span><span class="n">split</span><span class="p">(</span><span class="s">'.'</span><span class="p">)[</span><span class="mi">0</span><span class="p">],</span> <span class="n">i</span><span class="p">.</span>< [...]
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">shape_list</span> <span class="o">=</span> <span class="p">[(</span><span class="n">i</span><span class="o">.</span><span class="n">debugName</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'.'</span><span class="p">)[</span><span class="mi">0</span><span class="p">],</span> <span class="n">i</span><span class="o">.</span>< [...]
 
-<span class="n">mod_bert</span><span class="p">,</span> <span class="n">params_bert</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">frontend</span><span class="p">.</span><span class="n">pytorch</span><span class="p">.</span><span class="n">from_pytorch</span><span class="p">(</span><span class="n">traced_model</span><span class="p">,</span>
+<span class="n">mod_bert</span><span class="p">,</span> <span class="n">params_bert</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">frontend</span><span class="o">.</span><span class="n">pytorch</span><span class="o">.</span><span class="n">from_pytorch</span><span class="p">(</span><span class="n">traced_model</span><span class="p">,</span>
                         <span class="n">shape_list</span><span class="p">,</span> <span class="n">default_dtype</span><span class="o">=</span><span class="s">"float32"</span><span class="p">)</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>There will be a few warnings about not finding dtype information, but it goes well!
 We can now build and run it. Building follows the standard TVM recipe. We also convert the PyTorch (cpu) tensors to TVM arrays.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">target</span> <span class="o">=</span> <span class="s">'rocm -model=gfx906'</span>  <span class="c1"># use what matches your GPU
-</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">target</span> <span class="o">=</span> <span class="s">'rocm -model=gfx906'</span>  <span class="c"># use what matches your GPU</span>
+
 <span class="n">target_host</span> <span class="o">=</span> <span class="s">'llvm'</span>
-<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">context</span><span class="p">(</span><span class="n">target</span><span class="p">)</span>
+<span class="n">ctx</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">context</span><span class="p">(</span><span class="n">target</span><span class="p">)</span>
+
+<span class="n">tt_a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">tokens_tensor</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">ctx</span><span class="p">)</span>
+<span class="n">st_a</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">segments_tensors</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">ctx</span><span class="p">)</span>
+</code></pre>
+</div>
 
-<span class="n">tt_a</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">tokens_tensor</span><span class="p">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">ctx</span><span class="p">)</span>
-<span class="n">st_a</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">segments_tensors</span><span class="p">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">ctx</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">compile_engine</span><span class="o">.</span><span class="n">get</span><span class="p">()</span><span class="o">.</span><span class="n">clear</span><span class="p">()</span> <span class="c"># just to be sure, see https://github.com/apache/incub [...]
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">backend</span><span class="p">.</span><span class="n">compile_engine</span><span class="p">.</span><span class="n">get</span><span class="p">().</span><span class="n">clear</span><span class="p">()</span> <span class="c1"># just to be sure, see https://github.com/apache/incu [...]
-</span>
-<span class="k">with</span> <span class="n">tvm</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">PassContext</span><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
-        <span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">mod_bert</span><span class="p">,</span>
+<span class="k">with</span> <span class="n">tvm</span><span class="o">.</span><span class="n">transform</span><span class="o">.</span><span class="n">PassContext</span><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
+        <span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">mod_bert</span><span class="p">,</span>
                                      <span class="n">target</span><span class="o">=</span><span class="n">target</span><span class="p">,</span>
                                      <span class="n">target_host</span><span class="o">=</span><span class="n">target_host</span><span class="p">,</span>
                                      <span class="n">params</span><span class="o">=</span><span class="n">params_bert</span><span class="p">)</span>
-<span class="n">module</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">contrib</span><span class="p">.</span><span class="n">graph_runtime</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
-</code></pre></div></div>
+<span class="n">module</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">contrib</span><span class="o">.</span><span class="n">graph_runtime</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">graph</span><span class="p">,</span> <span class="n">lib</span><span class="p">,</span> <span class="n">ctx</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>This will warn us a few times times:</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    WARNING:autotvm:Cannot find config for ... batch_matmul.cuda .... A fallback configuration is used, which may bring great performance regression.
-</code></pre></div></div>
+<div class="highlighter-rouge"><pre class="highlight"><code>    WARNING:autotvm:Cannot find config for ... batch_matmul.cuda .... A fallback configuration is used, which may bring great performance regression.
+</code></pre>
+</div>
 
 <p>Uh oh, <em>may bring great performance regression</em>. Let us see.</p>
 
 <p>But first we run the model and see if the outputs match:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="p">(</span><span class="mf">8.583069e-06</span><span class="p">,</span> <span class="mf">8.493662e-07</span><span class="p">)</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code>    <span class="p">(</span><span class="mf">8.583069e-06</span><span class="p">,</span> <span class="mf">8.493662e-07</span><span class="p">)</span>
+</code></pre>
+</div>
 
 <p>Looks good. Remember that we’re computing in float32, so $10^{-6}$ish is a good result.</p>
 
 <p>After building our model and setting the parameters, we time our model like this:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">x</span><span class="p">():</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">x</span><span class="p">():</span>
     <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">):</span>
-        <span class="n">module</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
-    <span class="n">ctx</span><span class="p">.</span><span class="n">sync</span><span class="p">()</span>
+        <span class="n">module</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
+    <span class="n">ctx</span><span class="o">.</span><span class="n">sync</span><span class="p">()</span>
 <span class="n">x</span><span class="p">()</span>
 <span class="o">%</span><span class="n">timeit</span> <span class="n">x</span><span class="p">()</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Ouch, it takes 6.65s per 100 runs, or 67ms per run of the model. That’s slow indeed. But the warning said that is was because it could not find (tuned) configurations. Let us then tune the tasks.</p>
 
@@ -247,144 +263,146 @@ Now it’s in the region of 6.5-7ms per run, similar to PyTorch. This is what we
 
 <p>Let us take a closer look at what’s going on in BERT.</p>
 
-<p>Like many deep learning models, BERT comes with a bit some prologue (vocabulary embeddings) and epilogue (pooling) and the bulk is organized into similar-looking blocks, here we have 12 <code class="language-plaintext highlighter-rouge">BertLayer</code> modules.
-The <code class="language-plaintext highlighter-rouge">attention_mask</code> is jsut to prevent BERT from looking at the answer when dealing with the question.</p>
+<p>Like many deep learning models, BERT comes with a bit some prologue (vocabulary embeddings) and epilogue (pooling) and the bulk is organized into similar-looking blocks, here we have 12 <code class="highlighter-rouge">BertLayer</code> modules.
+The <code class="highlighter-rouge">attention_mask</code> is jsut to prevent BERT from looking at the answer when dealing with the question.</p>
 
 <p><img src="/images/bert-pytorch/bert_model.svg" alt="Bert Model" width="100%" /></p>
 
 <p>So let us zoom in and look at a BertLayer in detail, since that ultimately is what we need make fast.
-As we see in the net diagram, the main part of the <code class="language-plaintext highlighter-rouge">BertLayer</code> module is a submodule <code class="language-plaintext highlighter-rouge">BertSelfAttention</code>.</p>
+As we see in the net diagram, the main part of the <code class="highlighter-rouge">BertLayer</code> module is a submodule <code class="highlighter-rouge">BertSelfAttention</code>.</p>
 
 <p><img src="/images/bert-pytorch/bert_layer.svg" alt="BertLayer" width="100%" /></p>
 
-<p>Now the <code class="language-plaintext highlighter-rouge">BertSelfAttention</code> captures the famed self-attention mechanism that is the hallmark of transformer models. (I cannot recommend Sascha Rush’s <a href="http://nlp.seas.harvard.edu/2018/04/03/attention.html">Annotated Transformer</a> enough as a detailed walkthrough.)</p>
+<p>Now the <code class="highlighter-rouge">BertSelfAttention</code> captures the famed self-attention mechanism that is the hallmark of transformer models. (I cannot recommend Sascha Rush’s <a href="http://nlp.seas.harvard.edu/2018/04/03/attention.html">Annotated Transformer</a> enough as a detailed walkthrough.)</p>
 
 <h2 id="putting-the-bertlayer-under-the-microscope">Putting the BertLayer under the Microscope</h2>
 
 <p>If we want go into details, we should want to run a BertLayer individually.
-We grab the inputs of a BertLayer (see the Notebook for how) and convert a single <code class="language-plaintext highlighter-rouge">BertLayer</code> to TVM as we did for the entire model.</p>
+We grab the inputs of a BertLayer (see the Notebook for how) and convert a single <code class="highlighter-rouge">BertLayer</code> to TVM as we did for the entire model.</p>
 
 <p>To look at the TVM module, we define a little visualization helper (loosely based on TVM <a href="https://github.com/apache/incubator-tvm/pull/4370">PR#4370</a>).</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">graphviz</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">graphviz</span>
 <span class="k">def</span> <span class="nf">visualize</span><span class="p">(</span><span class="n">expr</span><span class="p">,</span> <span class="n">collapse_small</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">node_attr_dict</span> <span class="o">=</span> <span class="p">{}):</span>
     <span class="k">def</span> <span class="nf">collect_ops</span><span class="p">(</span><span class="n">node</span><span class="p">):</span>
         <span class="n">ops</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
         <span class="k">def</span> <span class="nf">visitor</span><span class="p">(</span><span class="n">e</span><span class="p">):</span>
-            <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">ir</span><span class="p">.</span><span class="n">Op</span><span class="p">):</span>
-                <span class="n">ops</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">e</span><span class="p">.</span><span class="n">name</span><span class="p">)</span>
-        <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">analysis</span><span class="p">.</span><span class="n">post_order_visit</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">visitor</span><span class="p">)</span>
+            <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">e</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">ir</span><span class="o">.</span><span class="n">Op</span><span class="p">):</span>
+                <span class="n">ops</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
+        <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">analysis</span><span class="o">.</span><span class="n">post_order_visit</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">visitor</span><span class="p">)</span>
         <span class="k">return</span> <span class="n">ops</span>
 
-    <span class="c1"># node_dict maps a Relay node to an index (node ID)
-</span>    <span class="k">def</span> <span class="nf">_traverse_expr</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node_dict</span><span class="p">):</span>
+    <span class="c"># node_dict maps a Relay node to an index (node ID)</span>
+    <span class="k">def</span> <span class="nf">_traverse_expr</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">node_dict</span><span class="p">):</span>
         <span class="k">if</span> <span class="n">node</span> <span class="ow">in</span> <span class="n">node_dict</span><span class="p">:</span>
             <span class="k">return</span>
         <span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="p">]</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">node_dict</span><span class="p">)</span>
 
     <span class="n">node_dict</span> <span class="o">=</span> <span class="p">{}</span>
-    <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">analysis</span><span class="p">.</span><span class="n">post_order_visit</span><span class="p">(</span><span class="n">expr</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">_traverse_expr</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">node_dict</ [...]
+    <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">analysis</span><span class="o">.</span><span class="n">post_order_visit</span><span class="p">(</span><span class="n">expr</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">_traverse_expr</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">node_dict</ [...]
 
     <span class="n">relayviz_nodes</span> <span class="o">=</span> <span class="p">[]</span>
 
-    <span class="n">dot</span> <span class="o">=</span> <span class="n">graphviz</span><span class="p">.</span><span class="n">Digraph</span><span class="p">(</span><span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="p">)</span>
-    <span class="n">dot</span><span class="p">.</span><span class="n">attr</span><span class="p">(</span><span class="s">'node'</span><span class="p">,</span> <span class="n">shape</span> <span class="o">=</span> <span class="s">'box'</span><span class="p">)</span>
+    <span class="n">dot</span> <span class="o">=</span> <span class="n">graphviz</span><span class="o">.</span><span class="n">Digraph</span><span class="p">(</span><span class="n">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="p">)</span>
+    <span class="n">dot</span><span class="o">.</span><span class="n">attr</span><span class="p">(</span><span class="s">'node'</span><span class="p">,</span> <span class="n">shape</span> <span class="o">=</span> <span class="s">'box'</span><span class="p">)</span>
 
     <span class="k">def</span> <span class="nf">to_str</span><span class="p">(</span><span class="n">node</span><span class="p">):</span>
-        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Constant</span><span class="p">):</span>
-            <span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="n">node</span><span class="p">).</span><span class="n">lstrip</span><span class="p">(</span><span class="s">'Constant('</span><span class="p">)[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
+        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Constant</span><span class="p">):</span>
+            <span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="n">node</span><span class="p">)</span><span class="o">.</span><span class="n">lstrip</span><span class="p">(</span><span class="s">'Constant('</span><span class="p">)[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
         <span class="k">else</span><span class="p">:</span>
             <span class="k">raise</span> <span class="nb">NotImplementedError</span><span class="p">(</span><span class="s">"to_str:"</span> <span class="o">+</span> <span class="nb">repr</span><span class="p">(</span><span class="n">node</span><span class="p">))</span>
 
     <span class="k">def</span> <span class="nf">is_small_const</span><span class="p">(</span><span class="n">c</span><span class="p">):</span>
-        <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="n">collapse_small</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Constant</span><span class="p">)):</span>
+        <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="n">collapse_small</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Constant</span><span class="p">)):</span>
             <span class="k">return</span> <span class="bp">False</span>
-        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">runtime</span><span class="p">.</span><span class="n">ndarray</span><span class="p">.</span><span class="n">NDArray</span><span class="p">):</span>
-            <span class="k">return</span> <span class="n">numpy</span><span class="p">.</span><span class="n">prod</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">10</span>
+        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">runtime</span><span class="o">.</span><span class="n">ndarray</span><span class="o">.</span><span class="n">NDArray</span><span class="p">):</span>
+            <span class="k">return</span> <span class="n">numpy</span><span class="o">.</span><span class="n">prod</span><span class="p">(</span><span class="n">c</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">10</span>
         <span class="k">return</span> <span class="bp">True</span>
 
-    <span class="c1"># Sort by node ID
-</span>    <span class="k">for</span> <span class="n">node</span><span class="p">,</span> <span class="n">node_id</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">node_dict</span><span class="p">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><s [...]
-        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Function</span><span class="p">):</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Function'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="p">.</span><span class="n">body</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Var</span><span class="p">):</span>
-            <span class="k">if</span> <span class="n">node</span><span class="p">.</span><span class="n">type_annotation</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
-                <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">type_annotation</span><span class="p">,</span> <span class="s">'shape'</span><span class="p">):</span>
-                    <span class="n">shape</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">node</span><span class="p">.</span><span class="n">type_annotation</span><span class="p">.</span><span class="n">shape</span><span class="p">])</span>
-                    <span class="n">dtype</span> <span class="o">=</span> <span class="n">node</span><span class="p">.</span><span class="n">type_annotation</span><span class="p">.</span><span class="n">dtype</span>
-                    <span class="n">typstr</span> <span class="o">=</span> <span class="s">'Tensor[{}, {}]'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="p">)</span>
+    <span class="c"># Sort by node ID</span>
+    <span class="k">for</span> <span class="n">node</span><span class="p">,</span> <span class="n">node_id</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">node_dict</span><span class="o">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span cla [...]
+        <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Function</span><span class="p">):</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Function'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="o">.</span><span class="n">body</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Var</span><span class="p">):</span>
+            <span class="k">if</span> <span class="n">node</span><span class="o">.</span><span class="n">type_annotation</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
+                <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">type_annotation</span><span class="p">,</span> <span class="s">'shape'</span><span class="p">):</span>
+                    <span class="n">shape</span> <span class="o">=</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">int</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">type_annotation</span><span class="o">.</span><span class="n">shape</span><span class="p">])</span>
+                    <span class="n">dtype</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">type_annotation</span><span class="o">.</span><span class="n">dtype</span>
+                    <span class="n">typstr</span> <span class="o">=</span> <span class="s">'Tensor[{}, {}]'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">shape</span><span class="p">,</span> <span class="n">dtype</span><span class="p">)</span>
                 <span class="k">else</span><span class="p">:</span>
-                    <span class="n">typstr</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">type_annotation</span><span class="p">)</span>
+                    <span class="n">typstr</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">type_annotation</span><span class="p">)</span>
             <span class="k">else</span><span class="p">:</span>
                 <span class="n">typstr</span> <span class="o">=</span> <span class="s">'?'</span>
             <span class="n">d</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">shape</span> <span class="o">=</span> <span class="s">'ellipse'</span><span class="p">)</span>
-            <span class="n">d</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span>
-                     <span class="s">'{}: {}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span>
-                         <span class="n">node</span><span class="p">.</span><span class="n">name_hint</span><span class="p">,</span> <span class="n">typstr</span>
+            <span class="n">d</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span>
+                     <span class="s">'{}: {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span>
+                         <span class="n">node</span><span class="o">.</span><span class="n">name_hint</span><span class="p">,</span> <span class="n">typstr</span>
                      <span class="p">),</span> <span class="o">**</span><span class="n">d</span><span class="p">)</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Tuple</span><span class="p">):</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Tuple[...])'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
-            <span class="k">for</span> <span class="n">field</span> <span class="ow">in</span> <span class="n">node</span><span class="p">.</span><span class="n">fields</span><span class="p">:</span>
-                <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">field</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Constant</span><span class="p">):</span>
-
-            <span class="k">if</span> <span class="ow">not</span> <span class="n">is_small_const</span><span class="p">(</span><span class="n">node</span><span class="p">):</span> <span class="c1"># small consts are shown in ops
-</span>                <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Constant({}, {})'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class= [...]
-                        <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Call</span><span class="p">):</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Tuple</span><span class="p">):</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Tuple[...])'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+            <span class="k">for</span> <span class="n">field</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">fields</span><span class="p">:</span>
+                <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">field</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Constant</span><span class="p">):</span>
+
+            <span class="k">if</span> <span class="ow">not</span> <span class="n">is_small_const</span><span class="p">(</span><span class="n">node</span><span class="p">):</span> <span class="c"># small consts are shown in ops</span>
+                <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Constant({}, {})'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">shape</span><span class="p">,</s [...]
+                        <span class="o">**</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Call</span><span class="p">):</span>
             <span class="n">args_with_edge</span> <span class="o">=</span> <span class="p">[]</span>
             <span class="n">arg_str_list</span> <span class="o">=</span> <span class="p">[]</span>
-            <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">node</span><span class="p">.</span><span class="n">args</span><span class="p">:</span>
+            <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">args</span><span class="p">:</span>
                 <span class="k">if</span> <span class="n">is_small_const</span><span class="p">(</span><span class="n">arg</span><span class="p">):</span>
-                    <span class="n">arg_str_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">to_str</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>
+                    <span class="n">arg_str_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">to_str</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>
                 <span class="k">else</span><span class="p">:</span>
-                    <span class="n">arg_str_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'·'</span><span class="p">)</span>
-                    <span class="n">args_with_edge</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span>
-            <span class="n">arg_str</span> <span class="o">=</span> <span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">arg_str_list</span><span class="p">)</span>
-            <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">op</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">ir</span><span class="p">.</span><span class="n">Op</span><span class="p">):</span>
-                <span class="n">name</span> <span class="o">=</span> <span class="n">node</span><span class="p">.</span><span class="n">op</span><span class="p">.</span><span class="n">name</span>
-                <span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span><span class="n">k</span><span class="p">:</span><span class="nb">getattr</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">attrs</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">node</span><span class="p">.</span><span class= [...]
-                <span class="c1">#attrs = inspect.getmembers(node.attrs)
-</span>                <span class="n">attr_str_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="s">'='</span><span class="o">+</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">v</span><span cla [...]
+                    <span class="n">arg_str_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s">'·'</span><span class="p">)</span>
+                    <span class="n">args_with_edge</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span>
+            <span class="n">arg_str</span> <span class="o">=</span> <span class="s">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">arg_str_list</span><span class="p">)</span>
+            <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">ir</span><span class="o">.</span><span class="n">Op</span><span class="p">):</span>
+                <span class="n">name</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">name</span>
+                <span class="n">attrs</span> <span class="o">=</span> <span class="p">{</span><span class="n">k</span><span class="p">:</span><span class="nb">getattr</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">attrs</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class= [...]
+                <span class="c">#attrs = inspect.getmembers(node.attrs)</span>
+                <span class="n">attr_str_list</span> <span class="o">=</span> <span class="p">[</span><span class="n">k</span><span class="o">+</span><span class="s">'='</span><span class="o">+</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">v</span><span class="p"> [...]
                 <span class="k">if</span> <span class="n">attr_str_list</span><span class="p">:</span>
-                    <span class="n">attr_str</span> <span class="o">=</span> <span class="s">'| '</span><span class="o">+</span> <span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">attr_str_list</span><span class="p">)</span>
+                    <span class="n">attr_str</span> <span class="o">=</span> <span class="s">'| '</span><span class="o">+</span> <span class="s">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">attr_str_list</span><span class="p">)</span>
                 <span class="k">else</span><span class="p">:</span>
                     <span class="n">attr_str</span> <span class="o">=</span> <span class="s">''</span>
             <span class="k">else</span><span class="p">:</span>
                 <span class="n">ops</span> <span class="o">=</span> <span class="n">collect_ops</span><span class="p">(</span><span class="n">node</span><span class="p">)</span>
                 <span class="k">if</span> <span class="n">ops</span><span class="p">:</span>
-                    <span class="n">name</span> <span class="o">=</span> <span class="s">'_'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">ops</span><span class="p">)</span>
+                    <span class="n">name</span> <span class="o">=</span> <span class="s">'_'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">ops</span><span class="p">)</span>
                 <span class="k">else</span><span class="p">:</span>
                     <span class="n">name</span> <span class="o">=</span> <span class="s">'...'</span>
                 <span class="n">attr_str</span> <span class="o">=</span> <span class="s">''</span>
-            <span class="n">s</span> <span class="o">=</span> <span class="s">f'</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">(</span><span class="si">{</span><span class="n">arg_str</span><span class="si">}{</span><span class="n">attr_str</span><span class="si">}</span><span class="s">)'</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="n">s</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+            <span class="n">s</span> <span class="o">=</span> <span class="n">f</span><span class="s">'{name}({arg_str}{attr_str})'</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="n">s</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
             <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args_with_edge</span><span class="p">:</span>
-                <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">arg</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">ir</span><span class="p">.</span><span class="n">Op</span><span class="p">):</span>
-            <span class="c1"># dot.node(str(node_id), 'Op {}'.format(node.name))
-</span>            <span class="k">pass</span> <span class="c1"># covered in call
-</span>        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">TupleGetItem</span><span class="p">):</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'TupleGetItem(idx={})'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">node</span><span class="p">.</span><span class="n">index</span><span class="p">),</span> <span class="o">**</span><span class="n">nod [...]
-            <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="p">.</span><span class="n">tuple_value</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
-        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">Let</span><span class="p">):</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Let(XX)'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="p">.</span><span class="n">value</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
-            <span class="n">dot</span><span class="p">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="p">.</span><span class="n">var</span><span class="p">]))</span>
+                <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">arg</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">ir</span><span class="o">.</span><span class="n">Op</span><span class="p">):</span>
+            <span class="c"># dot.node(str(node_id), 'Op {}'.format(node.name))</span>
+            <span class="k">pass</span> <span class="c"># covered in call</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">TupleGetItem</span><span class="p">):</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'TupleGetItem(idx={})'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">node</span><span class="o">.</span><span class="n">index</span><span class="p">),</span> <span class="o">**</span><span class="n">node [...]
+            <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="o">.</span><span class="n">tuple_value</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
+        <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">Let</span><span class="p">):</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">node</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="s">'Let(XX)'</span><span class="p">,</span> <span class="o">**</span><span class="n">node_attr_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="p">{}))</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="o">.</span><span class="n">value</span><span class="p">]),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">))</span>
+            <span class="n">dot</span><span class="o">.</span><span class="n">edge</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">node_id</span><span class="p">),</span> <span class="nb">str</span><span class="p">(</span><span class="n">node_dict</span><span class="p">[</span><span class="n">node</span><span class="o">.</span><span class="n">var</span><span class="p">]))</span>
         <span class="k">else</span><span class="p">:</span>
             <span class="k">raise</span> <span class="nb">RuntimeError</span><span class="p">(</span>
-                <span class="s">'Unknown node type. node_id: {}, node: {}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">node_id</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)))</span>
+                <span class="s">'Unknown node type. node_id: {}, node: {}'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">node_id</span><span class="p">,</span> <span class="nb">type</span><span class="p">(</span><span class="n">node</span><span class="p">)))</span>
 
     <span class="k">return</span> <span class="n">dot</span>
 
-</code></pre></div></div>
+</code></pre>
+</div>
 
-<p>Let’s run that on our main function. For some reason (well, to be fully general, probably) the PyTorch converter will convert <code class="language-plaintext highlighter-rouge">Linear</code> layers to <code class="language-plaintext highlighter-rouge">batch_matmul</code> rather than just <code class="language-plaintext highlighter-rouge">dense</code>. We’ll get back to this in a bit. As TVM’s <code class="language-plaintext highlighter-rouge">batch_matmul</code> has the contraction ax [...]
+<p>Let’s run that on our main function. For some reason (well, to be fully general, probably) the PyTorch converter will convert <code class="highlighter-rouge">Linear</code> layers to <code class="highlighter-rouge">batch_matmul</code> rather than just <code class="highlighter-rouge">dense</code>. We’ll get back to this in a bit. As TVM’s <code class="highlighter-rouge">batch_matmul</code> has the contraction axis last on both operands (unlike PyTorch), there are quite a few transpose o [...]
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">visualize</span><span class="p">(</span><span class="n">mod</span><span class="p">[</span><span class="s">'main'</span><span class="p">])</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">visualize</span><span class="p">(</span><span class="n">mod</span><span class="p">[</span><span class="s">'main'</span><span class="p">])</span>
+</code></pre>
+</div>
 
 <p><img src="/images/bert-pytorch/bert-tvm_49_0.svg" alt="svg" width="100%" /></p>
 
@@ -394,7 +412,7 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
 
 <p>Just like the full model, we can run and time our submodule after checking that it computes the same quantities.</p>
 
-<p>100 runs take 20.2ms. The back of the envelope calculation here is that with <code class="language-plaintext highlighter-rouge">BertLayer</code> in PyTorch we are spending about 0.2ms in this layer, so about 2.4ms on 12 layers - a not the majority but a sizeable part of the 6-7ms overall runtime. Let’s compare to TVM. (A good rule is to never optimize without measuring.)</p>
+<p>100 runs take 20.2ms. The back of the envelope calculation here is that with <code class="highlighter-rouge">BertLayer</code> in PyTorch we are spending about 0.2ms in this layer, so about 2.4ms on 12 layers - a not the majority but a sizeable part of the 6-7ms overall runtime. Let’s compare to TVM. (A good rule is to never optimize without measuring.)</p>
 
 <p>Similarly, TVM clocks in at 18.2ms for 100 runs. So here we are again roughly on par with PyTorch.</p>
 
@@ -402,18 +420,19 @@ We grab the inputs of a BertLayer (see the Notebook for how) and convert a singl
 (A while ago, this did not succeed because it had distinct shape arguments, but this was since solved by the TVM developers in the dynamic to static conversion pass.)
 Also, the model parameters that are reshaped and transposed. Can we get rid of that, too?
 Yes. And for that we would first <em>bind</em> the parameters, i.e. put them into the model. Then the parameters have become constants instead of input nodes.
-With the <code class="language-plaintext highlighter-rouge">Foldconstant</code> pass, we can propagate the constants through the <code class="language-plaintext highlighter-rouge">transpose</code>s and <code class="language-plaintext highlighter-rouge">reshape</code>s to move them closer to the matmuls.</p>
+With the <code class="highlighter-rouge">Foldconstant</code> pass, we can propagate the constants through the <code class="highlighter-rouge">transpose</code>s and <code class="highlighter-rouge">reshape</code>s to move them closer to the matmuls.</p>
 
 <p>After these three (which TVM will do when we compile a relay model), our model looks like this:</p>
 
 <p><img src="/images/bert-pytorch/bert-tvm_72_0.svg" alt="svg" width="100%" /></p>
 
-<p>And now comes an interesting trick. It is more efficient to merge the three batch matmuls with the same input into a single <code class="language-plaintext highlighter-rouge">batch_matmul</code>. We implemented a pass doing this in <a href="https://github.com/apache/incubator-tvm/pull/5791">TVM PR 5791</a>. So let’s call it and also have another constant-folding pass.</p>
+<p>And now comes an interesting trick. It is more efficient to merge the three batch matmuls with the same input into a single <code class="highlighter-rouge">batch_matmul</code>. We implemented a pass doing this in <a href="https://github.com/apache/incubator-tvm/pull/5791">TVM PR 5791</a>. So let’s call it and also have another constant-folding pass.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">new_mod</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">CombineParallelBatchMatmul</span><span class="p">()(</span><span class="n">new_mod</span><span class="p">)</span>
-<span class="n">new_mod</span> <span class="o">=</span> <span class="n">tvm</span><span class="p">.</span><span class="n">relay</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">FoldConstant</span><span class="p">()(</span><span class="n">new_mod</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">new_mod</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">transform</span><span class="o">.</span><span class="n">CombineParallelBatchMatmul</span><span class="p">()(</span><span class="n">new_mod</span><span class="p">)</span>
+<span class="n">new_mod</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">transform</span><span class="o">.</span><span class="n">FoldConstant</span><span class="p">()(</span><span class="n">new_mod</span><span class="p">)</span>
 <span class="n">visualize</span><span class="p">(</span><span class="n">new_mod</span><span class="p">[</span><span class="s">"main"</span><span class="p">])</span>
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p><img src="/images/bert-pytorch/bert-tvm_74_0.svg" alt="svg" width="100%" /></p>
 
@@ -439,12 +458,12 @@ We go through the same exercise as above.</p>
 Also, when I investigated some inner layer, I grabbed the inputs to that to convert and feed into the TVM model. I do believe that this is a very effective technique.</p>
 
 <p>Sometimes, however, it is difficult to assess whether a deviation between the results is from numerical accuracy or from an error somewhere.
-When I initially converted the model, the the <code class="language-plaintext highlighter-rouge">SelfAttention</code> submodule output was replicated by the TVM model to about 1e-6.
+When I initially converted the model, the the <code class="highlighter-rouge">SelfAttention</code> submodule output was replicated by the TVM model to about 1e-6.
 However, the BertLayer conversion had something like 1-e3. I was not entirely clear whether that might be due to accumulated numerical errors or some material deviation somewhere.
 (This turned out to be the GELU activation, which was converted to FastGELU.)</p>
 
 <p>One of the things I like to do in this case is jump to double precision and check there. Numerical errors should get much smaller, while other deviations would remain of the same order.
-With the PyTorch frontend, you can trace the model converted to float64 on the PyTorch side if you pass <code class="language-plaintext highlighter-rouge">default_dtype="float64"</code> to the conversion function.</p>
+With the PyTorch frontend, you can trace the model converted to float64 on the PyTorch side if you pass <code class="highlighter-rouge">default_dtype="float64"</code> to the conversion function.</p>
 
 <p>Running the module and comparing to PyTorch should now have 1e-14 or so deviation.</p>
 
@@ -463,21 +482,22 @@ With the PyTorch frontend, you can trace the model converted to float64 on the P
 
 <p>In this second part we want see if we could use TVM while training BERT in PyTorch.
 Of course, this opens an entire new can of worms as we need to deal with autodifferentiation.
-While we stay with the theme from above and take <code class="language-plaintext highlighter-rouge">BertLayer</code> as the example, our methodology is representative of non-trivial modules in general.
+While we stay with the theme from above and take <code class="highlighter-rouge">BertLayer</code> as the example, our methodology is representative of non-trivial modules in general.
 We will want to divert the computation during training to TVM.</p>
 
 <p>So the user can take a (traceable) module and do</p>
-<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>add_tvm_dispatch(module, sample_input)
-</code></pre></div></div>
+<div class="highlighter-rouge"><pre class="highlight"><code>add_tvm_dispatch(module, sample_input)
+</code></pre>
+</div>
 <p>and then if she calls module with inputs of the same shape as the sample_input, she’ll get the outputs computed by TVM (as PyTorch tensors, of course) and if not, it’ll just use the regular forward.</p>
 
 <p>The but so we already hinted at the bad news: In this part we will see how to do these things. We will not yet achieve a great speedup.</p>
 
 <p>But enough talk, let us dive right in!
-Again, we get our relay model with running a traced <code class="language-plaintext highlighter-rouge">BertLayer</code> from the transformer <code class="language-plaintext highlighter-rouge">Bert</code> model through <code class="language-plaintext highlighter-rouge">tvm.relay.frontend.from_pytorch</code>.</p>
+Again, we get our relay model with running a traced <code class="highlighter-rouge">BertLayer</code> from the transformer <code class="highlighter-rouge">Bert</code> model through <code class="highlighter-rouge">tvm.relay.frontend.from_pytorch</code>.</p>
 
 <p>One thing we’ll do in between is to move from a modular interface in PyTorch - with named parameters - to a functional
-interface (which is what TVM can do for us). The first thing we want to do for that is arrange for the function arguments to be in an order that we can work with - i.e. first the direct inputs to the module and then the parameters in the same order that PyTorch uses them. After this operation, our <code class="language-plaintext highlighter-rouge">BertLayer </code> in TVM looks like this:</p>
+interface (which is what TVM can do for us). The first thing we want to do for that is arrange for the function arguments to be in an order that we can work with - i.e. first the direct inputs to the module and then the parameters in the same order that PyTorch uses them. After this operation, our <code class="highlighter-rouge">BertLayer </code> in TVM looks like this:</p>
 
 <p><img src="/images/bert-pytorch/pytorch-tvm-training_20_0.svg" alt="svg" width="100%" /></p>
 
@@ -486,21 +506,21 @@ interface (which is what TVM can do for us). The first thing we want to do for t
 <p>But we also have a few new transformations:</p>
 
 <ul>
-  <li>One particularity of the Autodifferentiation is that it’ll use a lot of <code class="language-plaintext highlighter-rouge">..._like</code> operations to broadcast or “unbroadcast” (summation is the dual of broadcasting w.r.t. autodifferentiation) things. But this means that you now have two tensor arguments, even if the latter doesn’t really need a gradient. <code class="language-plaintext highlighter-rouge">ZappLike</code> replaces those operations with the corresponding functions [...]
+  <li>One particularity of the Autodifferentiation is that it’ll use a lot of <code class="highlighter-rouge">..._like</code> operations to broadcast or “unbroadcast” (summation is the dual of broadcasting w.r.t. autodifferentiation) things. But this means that you now have two tensor arguments, even if the latter doesn’t really need a gradient. <code class="highlighter-rouge">ZappLike</code> replaces those operations with the corresponding functions taking a shape parameter instead.</li>
   <li>Another thing is the “rooting” of derivatives. TVM generates a tensors with all ones of the same shape as the return values of our function as the starting point for the chain rule. These are then multiplied to the derivatives of our operations. But multiplication with ones is not doing much, so we strike that. Similarly, TVM initializes the gradient of a variable (an input) to zeros of the same shape. If it isn’t used, the gradient will be zero, but if it is, the “real gradient” w [...]
-  <li>TVM doesn’t have a training variant for the <code class="language-plaintext highlighter-rouge">LayerNorm</code> (or <code class="language-plaintext highlighter-rouge">BatchNorm</code> or others). So we implement a pass to spell out the computation.</li>
+  <li>TVM doesn’t have a training variant for the <code class="highlighter-rouge">LayerNorm</code> (or <code class="highlighter-rouge">BatchNorm</code> or others). So we implement a pass to spell out the computation.</li>
   <li>TVM also doesn’t have training dropout. Here the problem is somewhat harder to fix, as TVM doesn’t have random currently. We instead replace the dropout by a construct taking a random bernoulli draw (of 0/1 values) and mimicking dropout with that. The idea is that we’ll use PyTorch to generate this mask for us. This has the added benefit that (if we generate dropout masks in the same order as PyTorch) we’ll get the exact same result.</li>
 </ul>
 
-<p>As hinted at above, TVM’s gradient taking assumes that it is the last element in the computation (the ones-Tensors discussed above). This isn’t a good fit with PyTorch’s modular view which expects a <code class="language-plaintext highlighter-rouge">grad_out</code> for each output to be given. Happily, this is computationally equivalent to multiplying by grad out and summation, so we amend our function with that. We wish to be flexible, so we allow both functions returning a single te [...]
+<p>As hinted at above, TVM’s gradient taking assumes that it is the last element in the computation (the ones-Tensors discussed above). This isn’t a good fit with PyTorch’s modular view which expects a <code class="highlighter-rouge">grad_out</code> for each output to be given. Happily, this is computationally equivalent to multiplying by grad out and summation, so we amend our function with that. We wish to be flexible, so we allow both functions returning a single tensor and those retu [...]
 
 <p>With these modificaitons applied, our model looks like this:</p>
 
 <p><img src="/images/bert-pytorch/pytorch-tvm-training_25_0.svg" alt="svg" width="100%" /></p>
 
-<p>Finally we can take the grad. As we get a lot of <code class="language-plaintext highlighter-rouge">let</code> nodes, we bring it to normal form using the <code class="language-plaintext highlighter-rouge">ToGraphNormalForm</code> pass.
-TVM’s gradient-taking returns a function that has the same parameters as the original function (in our case amended with the <code class="language-plaintext highlighter-rouge">grad_out</code> and dropout) and then returns a tuple of the original return and a tuple containing gradients for all inputs.
-The first thing we do is to drop all the gradients for <code class="language-plaintext highlighter-rouge">grad_out</code> and <code class="language-plaintext highlighter-rouge">dropout</code> which we don’t need.
+<p>Finally we can take the grad. As we get a lot of <code class="highlighter-rouge">let</code> nodes, we bring it to normal form using the <code class="highlighter-rouge">ToGraphNormalForm</code> pass.
+TVM’s gradient-taking returns a function that has the same parameters as the original function (in our case amended with the <code class="highlighter-rouge">grad_out</code> and dropout) and then returns a tuple of the original return and a tuple containing gradients for all inputs.
+The first thing we do is to drop all the gradients for <code class="highlighter-rouge">grad_out</code> and <code class="highlighter-rouge">dropout</code> which we don’t need.
 Then we run our simplification passes.</p>
 
 <p>So this is the graph we have now for forward and backward:</p>
@@ -520,12 +540,12 @@ split our graph. One of the difficult problems is what to do with things compute
 
 <p>We use a coloring here. First we color all nodes of the forward computation in red. Then we traverse the gradient calculation and then color the nodes it needs from the backward blue. This gives us a chance to show off the attribute support in our visualization.</p>
 
-<p>A bit of (PyTorch) terminology: When we have a function <em>Layer : x ↦ y</em> followed by some <em>Loss: y ↦ l ∈ ℝ</em>, the backward is <em>BackwardOfLayer : grad<code class="language-plaintext highlighter-rouge">_</code>out ↦ grad<code class="language-plaintext highlighter-rouge">_</code>in</em> with <em>grad<code class="language-plaintext highlighter-rouge">_</code>out = dl/dy</em> and *grad<code class="language-plaintext highlighter-rouge">_</code>in = dl/dx`.</p>
+<p>A bit of (PyTorch) terminology: When we have a function <em>Layer : x ↦ y</em> followed by some <em>Loss: y ↦ l ∈ ℝ</em>, the backward is <em>BackwardOfLayer : grad<code class="highlighter-rouge">_</code>out ↦ grad<code class="highlighter-rouge">_</code>in</em> with <em>grad<code class="highlighter-rouge">_</code>out = dl/dy</em> and *grad<code class="highlighter-rouge">_</code>in = dl/dx`.</p>
 
 <p><img src="/images/bert-pytorch/pytorch-tvm-training_34_0.svg" alt="svg" width="100%" /></p>
 
 <p>In order to split the function as described above, we collect the blue nodes as to capture - but constants will
-just be duplicated and inputs (<code class="language-plaintext highlighter-rouge">Var</code> nodes) need to be treated separately.
+just be duplicated and inputs (<code class="highlighter-rouge">Var</code> nodes) need to be treated separately.
 Now we can split out the backward, replacing all the blue nodes with variables.</p>
 
 <p>Next we take the forward and amend it to also return the required intermediates. The forward then looks like this:</p>
@@ -534,92 +554,100 @@ Now we can split out the backward, replacing all the blue nodes with variables.<
 
 <p>TVM cannot return nested tuples, so we flatten the output in the function. Again we differentiate between tensor-valued functions and tuple valued ones (i.e. those returning potentially multiple tensors).</p>
 
-<p>And at last, we can let TVM do its magic and compile our functions, say to <code class="language-plaintext highlighter-rouge">gr_only_compiled_module</code>
-and <code class="language-plaintext highlighter-rouge">fw_and_cap_compiled_module</code>.
+<p>And at last, we can let TVM do its magic and compile our functions, say to <code class="highlighter-rouge">gr_only_compiled_module</code>
+and <code class="highlighter-rouge">fw_and_cap_compiled_module</code>.
 Time to give it a spin. We define convenience functions to move tensors between PyTorch and TVM and get the model parameters as a TVM dictionary.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">tensor_to_tvm</span><span class="p">(</span><span class="n">t</span><span class="p">):</span>
-    <span class="k">return</span> <span class="n">tvm</span><span class="p">.</span><span class="n">nd</span><span class="p">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">dlpack</span><span class="p">.</span><span class="n">to_dlpack</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">def</span> <span class="nf">tensor_to_tvm</span><span class="p">(</span><span class="n">t</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">tvm</span><span class="o">.</span><span class="n">nd</span><span class="o">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">dlpack</span><span class="o">.</span><span class="n">to_dlpack</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
 <span class="k">def</span> <span class="nf">tensor_from_tvm</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
-    <span class="k">return</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">dlpack</span><span class="p">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">to_dlpack</span><span class="p">()))</span>
+    <span class="k">return</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">dlpack</span><span class="o">.</span><span class="n">from_dlpack</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">to_dlpack</span><span class="p">()))</span>
 
-<span class="n">model_params_tvm</span> <span class="o">=</span> <span class="p">{</span><span class="n">k</span><span class="p">:</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">pytorch_model</span><span class="p">.</span><span class="n">state_dict</span><span class="p">().</spa [...]
-</code></pre></div></div>
+<span class="n">model_params_tvm</span> <span class="o">=</span> <span class="p">{</span><span class="n">k</span><span class="p">:</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">pytorch_model</span><span class="o">.</span><span class="n">state_dict</span><span class="p">()</span [...]
+</code></pre>
+</div>
 
 <p>Similarly, we get the inputs on the GPU in PyTorch and TVM.</p>
 
 <p>We need to deal with the dropout. It will turn out that our record of the three dropout random draws happens in the same order as the dropout in the model. We did a depth-first search on the computational graph to find them and if the values of the the dropout are connected in the graph rather than being on independent branches, this will be the order in which PyTorch draws the matrices, too. If not, good luck fiddeling with the order.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">torch</span><span class="p">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">torch</span><span class="o">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
 <span class="n">drop_c</span> <span class="o">=</span> <span class="p">{}</span>
-<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">dropout_info</span><span class="p">.</span><span class="n">keys</span><span class="p">():</span> <span class="c1"># we don't know the order
-</span>    <span class="n">p</span><span class="p">,</span> <span class="n">typ</span> <span class="o">=</span> <span class="n">dropout_info</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
-    <span class="n">drop_c</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="n">functional</span><span class="p">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">ones</span><span class="p">([</span><span class="nb">int</span><span class="p">(< [...]
-                                              <span class="n">dtype</span><span class="o">=</span><span class="nb">getattr</span><span class="p">(</span><span class="n">torch</span><span class="p">,</span> <span class="n">typ</span><span class="p">.</span><span class="n">dtype</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">),</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><s [...]
+<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="n">dropout_info</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span> <span class="c"># we don't know the order</span>
+    <span class="n">p</span><span class="p">,</span> <span class="n">typ</span> <span class="o">=</span> <span class="n">dropout_info</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
+    <span class="n">drop_c</span><span class="p">[</span><span class="n">k</span><span class="p">]</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">functional</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">([</span><span class="nb">int</span><span class="p">(< [...]
+                                              <span class="n">dtype</span><span class="o">=</span><span class="nb">getattr</span><span class="p">(</span><span class="n">torch</span><span class="p">,</span> <span class="n">typ</span><span class="o">.</span><span class="n">dtype</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">),</span> <span class="n">p</span><span class="o">=</span><span class="n">p</span><s [...]
 
-<span class="n">drop_tvm</span> <span class="o">=</span> <span class="p">{</span><span class="n">n</span><span class="p">:</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">drop_c</span><span class="p">.</span><span class="n">items</span><span class="p">()}</span>
-</code></pre></div></div>
+<span class="n">drop_tvm</span> <span class="o">=</span> <span class="p">{</span><span class="n">n</span><span class="p">:</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">drop_c</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span>
+</code></pre>
+</div>
 
 <p>Now we can run the forward.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'input'</span><span class="p">,</span> <span class="n">inp_tvm</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
-<span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'attention_mask'</span><span class="p">,</span> <span class="n">inp_tvm</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
-<span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">model_params_tvm</span><span class="p">)</span>
-<span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">drop_tvm</span><span class="p">)</span>
-<span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'input'</span><span class="p">,</span> <span class="n">inp_tvm</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
+<span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'attention_mask'</span><span class="p">,</span> <span class="n">inp_tvm</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
+<span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">model_params_tvm</span><span class="p">)</span>
+<span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">drop_tvm</span><span class="p">)</span>
+<span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
+</code></pre>
+</div>
 
 <p>And we can compare the output to PyTorch’s:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">torch</span><span class="p">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
-<span class="n">pytorch_model</span><span class="p">.</span><span class="n">train</span><span class="p">()</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">torch</span><span class="o">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
+<span class="n">pytorch_model</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
 <span class="n">res</span> <span class="o">=</span> <span class="n">pytorch_model</span><span class="p">(</span><span class="o">*</span><span class="n">inp_c</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
-<span class="n">numpy</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="mi">0</span><span class="p">).</span><span class="n">asnumpy</span><span class="p">()</span><span class="o">-</span><span class="n">res</span><span class="p">.</span><span class="n">detach</span><span class="p">().</span><span class="n">cpu</span [...]
-</code></pre></div></div>
+<span class="n">numpy</span><span class="o">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">get_output</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span><span class="o">-</span><span class="n">res</span><span class="o">.</span><span class="n">detach</span><span class="p">()</span><sp [...]
+</code></pre>
+</div>
 
-<p>This gives <code class="language-plaintext highlighter-rouge">2.1457672e-06</code>.</p>
+<p>This gives <code class="highlighter-rouge">2.1457672e-06</code>.</p>
 
-<p>Supergood. Let’s also try the backward. We generate a <code class="language-plaintext highlighter-rouge">grad_out</code>, set all the variables and run the backward model and run the backward model</p>
+<p>Supergood. Let’s also try the backward. We generate a <code class="highlighter-rouge">grad_out</code>, set all the variables and run the backward model and run the backward model</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gr_out_c</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">randn</span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">,</span> <span class="n">dtype< [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">gr_out_c</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">,</span> <span class="n">dtype</span><span class="o">= [...]
+</code></pre>
+</div>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_captures</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">capture_vars</span><span class="p">)</span>
-<span class="n">num_regular_outputs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">fw_and_cap_fn_flattened</span><span class="p">.</span><span class="n">body</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span> <span class="o">-</span> <span class="n">num_captures</span>
-<span class="n">captured_values</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="p">.</span><span class="n">name_hint</span><span class="p">:</span> <span class="n">fw_and_cap_compiled_module</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="n">num_regular_outputs</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span>< [...]
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">num_captures</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">capture_vars</span><span class="p">)</span>
+<span class="n">num_regular_outputs</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">fw_and_cap_fn_flattened</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">fields</span><span class="p">)</span> <span class="o">-</span> <span class="n">num_captures</span>
+<span class="n">captured_values</span> <span class="o">=</span> <span class="p">{</span><span class="n">v</span><span class="o">.</span><span class="n">name_hint</span><span class="p">:</span> <span class="n">fw_and_cap_compiled_module</span><span class="o">.</span><span class="n">get_output</span><span class="p">(</span><span class="n">num_regular_outputs</span> <span class="o">+</span> <span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span>< [...]
 
-<span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">drop_tvm</span><span class="p">)</span>
-<span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">model_params_tvm</span><span class="p">)</span>
-<span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">captured_values</span><span class="p">)</span>
-<span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'gr:out:0'</span><span class="p">,</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">gr_out_c</span><span class="p">))</span>
-<span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">run</span><span class="p">()</span>
-</code></pre></div></div>
+<span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">drop_tvm</span><span class="p">)</span>
+<span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">model_params_tvm</span><span class="p">)</span>
+<span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="o">**</span><span class="n">captured_values</span><span class="p">)</span>
+<span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">set_input</span><span class="p">(</span><span class="s">'gr:out:0'</span><span class="p">,</span> <span class="n">tensor_to_tvm</span><span class="p">(</span><span class="n">gr_out_c</span><span class="p">))</span>
+<span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
+</code></pre>
+</div>
 
 <p>On the PyTorch side, it is easiest to re-run the forward (remembering to reset the random seed) and get the grads.</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">torch</span><span class="p">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
-<span class="n">pytorch_model</span><span class="p">.</span><span class="n">train</span><span class="p">()</span>
-<span class="n">inp_c_rq</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span><span class="p">.</span><span class="n">requires_grad_</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">inp_c</span><span class="p">]</span>
-<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">pytorch_model</span><span class="p">.</span><span class="n">parameters</span><span class="p">():</span>
-    <span class="n">p</span><span class="p">.</span><span class="n">requires_grad_</span><span class="p">()</span>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="n">torch</span><span class="o">.</span><span class="n">manual_seed</span><span class="p">(</span><span class="mi">12345</span><span class="p">)</span>
+<span class="n">pytorch_model</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
+<span class="n">inp_c_rq</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span><span class="o">.</span><span class="n">requires_grad_</span><span class="p">()</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">inp_c</span><span class="p">]</span>
+<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">pytorch_model</span><span class="o">.</span><span class="n">parameters</span><span class="p">():</span>
+    <span class="n">p</span><span class="o">.</span><span class="n">requires_grad_</span><span class="p">()</span>
 <span class="n">res</span> <span class="o">=</span> <span class="n">pytorch_model</span><span class="p">(</span><span class="o">*</span><span class="n">inp_c_rq</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
-<span class="n">grads_pt</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">autograd</span><span class="p">.</span><span class="n">grad</span><span class="p">(</span><span class="n">res</span><span class="p">,</span> <span class="n">inp_c_rq</span> <span class="o">+</span> <span class="nb">list</span><span class="p">(</span><span class="n">pytorch_model</span><span class="p">.</span><span class="n">parameters</span><span class="p">()),</sp [...]
+<span class="n">grads_pt</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">autograd</span><span class="o">.</span><span class="n">grad</span><span class="p">(</span><span class="n">res</span><span class="p">,</span> <span class="n">inp_c_rq</span> <span class="o">+</span> <span class="nb">list</span><span class="p">(</span><span class="n">pytorch_model</span><span class="o">.</span><span class="n">parameters</span><span class="p">()),</sp [...]
 
-</code></pre></div></div>
+</code></pre>
+</div>
 
 <p>Did it work? It seems so:</p>
 
-<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">g_pt</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">grads_pt</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="n">numpy</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">gr_only_compiled_module</span><span class="p">.</span><span class="n">get_output</span><span class="p">(</span><span class="n">i</span><span class="p">).</span><span class="n">asnumpy</span><span class="p">()</span> <span class="o">-</span> <span class="n">g_pt</span><span class="p">.</span><span class="n">cpu</span [...]
-</code></pre></div></div>
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">g_pt</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">grads_pt</span><span class="p">):</span>
+    <span class="k">print</span><span class="p">(</span><span class="n">numpy</span><span class="o">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">gr_only_compiled_module</span><span class="o">.</span><span class="n">get_output</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="o">.</span><span class="n">asnumpy</span><span class="p">()</span> <span class="o">-</span> <span class="n">g_pt</span><span class="o">.</span><s [...]
+</code></pre>
+</div>
 
 <p>gives us a list of numbers in the 1e-5ish range.</p>
 
 <p>But we wanted to get something running in PyTorch, right?</p>
 
-<p>Keeping with how PyTorch works, we first define an <code class="language-plaintext highlighter-rouge">autograd.Function</code> that the things we just did manually:</p>
+<p>Keeping with how PyTorch works, we first define an <code class="highlighter-rouge">autograd.Function</code> that the things we just did manually:</p>
 
-<p>In the <code class="language-plaintext highlighter-rouge">forward</code>:</p>
+<p>In the <code class="highlighter-rouge">forward</code>:</p>
 
 <ul>
   <li>Generate the dropout random values,</li>
@@ -627,11 +655,11 @@ Time to give it a spin. We define convenience functions to move tensors between
   <li>Record the captures, inputs, and dropout values needed for backward.</li>
 </ul>
 
-<p>In the <code class="language-plaintext highlighter-rouge">backward</code>, run the backward and return the result (as PyTorch tensors).</p>
+<p>In the <code class="highlighter-rouge">backward</code>, run the backward and return the result (as PyTorch tensors).</p>
 
 <p>With that, we get a PyTorch autograd.Function calling into TVM (we would want a small wrapper for that.</p>
 
-<p>Now all we need to do to achive our goal of getting a method <code class="language-plaintext highlighter-rouge">add_tvm_dispatch(module, sample_inputs)</code> is
+<p>Now all we need to do to achive our goal of getting a method <code class="highlighter-rouge">add_tvm_dispatch(module, sample_inputs)</code> is
 to trace the module, create the TVM-based autograd function from it and then replace the forward that calls
... 22302 lines suppressed ...