You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2022/11/16 01:28:30 UTC

[tvm-site] branch asf-site updated: deploying docs (apache/tvm@4f4b4edafdb0837972e4df6f570368f9f7cefd20)

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 34efd0733f deploying docs (apache/tvm@4f4b4edafdb0837972e4df6f570368f9f7cefd20)
34efd0733f is described below

commit 34efd0733f3c410bf4a6b480a94f501bb1cdfc4c
Author: tvm-bot <95...@users.noreply.github.com>
AuthorDate: Wed Nov 16 01:28:23 2022 +0000

    deploying docs (apache/tvm@4f4b4edafdb0837972e4df6f570368f9f7cefd20)
---
 docs/_images/sphx_glr_micro_train_001.png          | Bin 329141 -> 328241 bytes
 docs/_images/sphx_glr_micro_train_thumb.png        | Bin 22792 -> 23601 bytes
 docs/_sources/contribute/release_process.rst.txt   |  27 +-
 .../how_to/compile_models/from_darknet.rst.txt     |   2 +-
 .../how_to/compile_models/from_keras.rst.txt       |   2 +-
 .../how_to/compile_models/from_mxnet.rst.txt       |   2 +-
 .../how_to/compile_models/from_oneflow.rst.txt     |   2 +-
 .../how_to/compile_models/from_pytorch.rst.txt     |   2 +-
 .../how_to/compile_models/from_tensorflow.rst.txt  |   2 +-
 .../compile_models/sg_execution_times.rst.txt      |  22 +-
 .../deploy_models/deploy_model_on_android.rst.txt  |   2 +-
 .../deploy_object_detection_pytorch.rst.txt        |   4 +-
 .../deploy_models/deploy_prequantized.rst.txt      |   6 +-
 .../deploy_prequantized_tflite.rst.txt             |   4 +-
 .../how_to/deploy_models/deploy_quantized.rst.txt  |   2 +-
 .../deploy_models/deploy_ssd_gluoncv.rst.txt       |   4 +-
 .../deploy_models/sg_execution_times.rst.txt       |  20 +-
 .../extend_tvm/bring_your_own_datatypes.rst.txt    |   2 +-
 .../how_to/extend_tvm/sg_execution_times.rst.txt   |   8 +-
 .../how_to/extend_tvm/use_pass_instrument.rst.txt  |  16 +-
 .../optimize_operators/opt_conv_cuda.rst.txt       |   2 +-
 .../optimize_operators/opt_conv_tensorcore.rst.txt |   2 +-
 .../how_to/optimize_operators/opt_gemm.rst.txt     |  16 +-
 .../optimize_operators/sg_execution_times.rst.txt  |   8 +-
 .../sg_execution_times.rst.txt                     |  14 +-
 .../tune_conv2d_layer_cuda.rst.txt                 |   4 +-
 .../tune_network_cuda.rst.txt                      |   4 +-
 .../tune_network_x86.rst.txt                       |   4 +-
 .../tune_sparse_x86.rst.txt                        | 177 +++++-------
 .../tune_with_autotvm/sg_execution_times.rst.txt   |  10 +-
 .../tune_with_autotvm/tune_conv2d_cuda.rst.txt     | 312 ++++++++++-----------
 .../work_with_microtvm/micro_autotune.rst.txt      |  16 +-
 .../work_with_microtvm/micro_pytorch.rst.txt       |   4 +-
 .../how_to/work_with_microtvm/micro_train.rst.txt  |  18 +-
 .../work_with_microtvm/sg_execution_times.rst.txt  |  12 +-
 .../work_with_relay/sg_execution_times.rst.txt     |   8 +-
 .../how_to/work_with_schedules/intrin_math.rst.txt |   2 +-
 .../work_with_schedules/sg_execution_times.rst.txt |  18 +-
 .../how_to/work_with_schedules/tensorize.rst.txt   |   2 +-
 .../tutorials/autotvm/sg_execution_times.rst.txt   |   4 +-
 .../frontend/deploy_classification.rst.txt         |   2 +-
 .../tutorials/frontend/deploy_detection.rst.txt    |   2 +-
 .../tutorials/frontend/sg_execution_times.rst.txt  |   6 +-
 .../tutorials/optimize/sg_execution_times.rst.txt  |   6 +-
 .../topic/vta/tutorials/sg_execution_times.rst.txt |   6 +-
 .../tutorial/auto_scheduler_matmul_x86.rst.txt     |  13 +-
 docs/_sources/tutorial/autotvm_matmul_x86.rst.txt  |  20 +-
 docs/_sources/tutorial/autotvm_relay_x86.rst.txt   |  61 ++--
 .../tutorial/cross_compilation_and_rpc.rst.txt     |   2 +-
 docs/_sources/tutorial/intro_topi.rst.txt          |   2 +-
 docs/_sources/tutorial/sg_execution_times.rst.txt  |  22 +-
 .../tutorial/tensor_expr_get_started.rst.txt       |  44 +--
 docs/commit_hash                                   |   2 +-
 docs/contribute/index.html                         |   2 +-
 docs/contribute/release_process.html               |  31 +-
 docs/how_to/compile_models/from_darknet.html       |   2 +-
 docs/how_to/compile_models/from_keras.html         |   2 +-
 docs/how_to/compile_models/from_mxnet.html         |   2 +-
 docs/how_to/compile_models/from_oneflow.html       |  14 +-
 docs/how_to/compile_models/from_pytorch.html       |  11 +-
 docs/how_to/compile_models/from_tensorflow.html    |   2 +-
 docs/how_to/compile_models/sg_execution_times.html |  26 +-
 .../deploy_models/deploy_model_on_android.html     |   2 +-
 .../deploy_object_detection_pytorch.html           |  31 +-
 docs/how_to/deploy_models/deploy_prequantized.html |   8 +-
 .../deploy_models/deploy_prequantized_tflite.html  |   4 +-
 docs/how_to/deploy_models/deploy_quantized.html    |   2 +-
 docs/how_to/deploy_models/deploy_ssd_gluoncv.html  |  36 ++-
 docs/how_to/deploy_models/sg_execution_times.html  |  20 +-
 .../extend_tvm/bring_your_own_datatypes.html       |   2 +-
 docs/how_to/extend_tvm/sg_execution_times.html     |   8 +-
 docs/how_to/extend_tvm/use_pass_instrument.html    |  16 +-
 docs/how_to/optimize_operators/opt_conv_cuda.html  |   2 +-
 .../optimize_operators/opt_conv_tensorcore.html    |   2 +-
 docs/how_to/optimize_operators/opt_gemm.html       |  16 +-
 .../optimize_operators/sg_execution_times.html     |   8 +-
 .../sg_execution_times.html                        |  14 +-
 .../tune_conv2d_layer_cuda.html                    |   4 +-
 .../tune_with_autoscheduler/tune_network_cuda.html |   4 +-
 .../tune_with_autoscheduler/tune_network_x86.html  |   4 +-
 .../tune_with_autoscheduler/tune_sparse_x86.html   | 177 +++++-------
 .../tune_with_autotvm/sg_execution_times.html      |  10 +-
 .../how_to/tune_with_autotvm/tune_conv2d_cuda.html | 312 ++++++++++-----------
 docs/how_to/work_with_microtvm/micro_autotune.html |  16 +-
 docs/how_to/work_with_microtvm/micro_pytorch.html  |   4 +-
 docs/how_to/work_with_microtvm/micro_train.html    |  16 +-
 .../work_with_microtvm/sg_execution_times.html     |  12 +-
 .../how_to/work_with_relay/sg_execution_times.html |   8 +-
 docs/how_to/work_with_schedules/intrin_math.html   |   2 +-
 .../work_with_schedules/sg_execution_times.html    |  18 +-
 docs/how_to/work_with_schedules/tensorize.html     |   2 +-
 docs/reference/api/python/auto_scheduler.html      |   4 +-
 .../api/typedoc/classes/bytestreamreader.html      |  12 +-
 .../api/typedoc/classes/cachedcallstack.html       |  34 +--
 docs/reference/api/typedoc/classes/dldatatype.html |  12 +-
 docs/reference/api/typedoc/classes/dldevice.html   |  10 +-
 .../reference/api/typedoc/classes/environment.html |  12 +-
 docs/reference/api/typedoc/classes/ffilibrary.html |  20 +-
 .../api/typedoc/classes/graphexecutor.html         |  16 +-
 docs/reference/api/typedoc/classes/instance.html   |  40 +--
 docs/reference/api/typedoc/classes/memory.html     |  34 +--
 docs/reference/api/typedoc/classes/module.html     |  10 +-
 docs/reference/api/typedoc/classes/ndarray.html    |  22 +-
 .../api/typedoc/classes/packedfunccell.html        |   6 +-
 docs/reference/api/typedoc/classes/rpcserver.html  |  14 +-
 docs/reference/api/typedoc/classes/scalar.html     |   6 +-
 .../api/typedoc/classes/webgpucontext.html         |  12 +-
 docs/reference/api/typedoc/enums/argtypecode.html  |  30 +-
 .../api/typedoc/enums/aynccallbackcode.html        |   4 +-
 .../api/typedoc/enums/dldatatypecode.html          |   8 +-
 .../api/typedoc/enums/rpcserverstate.html          |  12 +-
 docs/reference/api/typedoc/enums/sizeof.html       |  18 +-
 docs/reference/api/typedoc/index.html              | 112 ++++----
 .../api/typedoc/interfaces/disposable.html         |   2 +-
 .../api/typedoc/interfaces/functioninfo.html       |   6 +-
 .../api/typedoc/interfaces/libraryprovider.html    |   4 +-
 docs/searchindex.js                                |   2 +-
 .../vta/tutorials/autotvm/sg_execution_times.html  |   4 +-
 .../tutorials/frontend/deploy_classification.html  |   2 +-
 .../vta/tutorials/frontend/deploy_detection.html   |   2 +-
 .../vta/tutorials/frontend/sg_execution_times.html |   6 +-
 .../vta/tutorials/optimize/sg_execution_times.html |   6 +-
 docs/topic/vta/tutorials/sg_execution_times.html   |   6 +-
 docs/tutorial/auto_scheduler_matmul_x86.html       |   9 +-
 docs/tutorial/autotvm_matmul_x86.html              |  20 +-
 docs/tutorial/autotvm_relay_x86.html               | 278 +++++++++---------
 docs/tutorial/cross_compilation_and_rpc.html       |   2 +-
 docs/tutorial/intro_topi.html                      |   2 +-
 docs/tutorial/sg_execution_times.html              |  26 +-
 docs/tutorial/tensor_expr_get_started.html         |  44 +--
 130 files changed, 1270 insertions(+), 1405 deletions(-)

diff --git a/docs/_images/sphx_glr_micro_train_001.png b/docs/_images/sphx_glr_micro_train_001.png
index 9c86215278..4df2440260 100644
Binary files a/docs/_images/sphx_glr_micro_train_001.png and b/docs/_images/sphx_glr_micro_train_001.png differ
diff --git a/docs/_images/sphx_glr_micro_train_thumb.png b/docs/_images/sphx_glr_micro_train_thumb.png
index 839a4f5975..ed08deb0d3 100644
Binary files a/docs/_images/sphx_glr_micro_train_thumb.png and b/docs/_images/sphx_glr_micro_train_thumb.png differ
diff --git a/docs/_sources/contribute/release_process.rst.txt b/docs/_sources/contribute/release_process.rst.txt
index 4b5c45fc84..463536f200 100644
--- a/docs/_sources/contribute/release_process.rst.txt
+++ b/docs/_sources/contribute/release_process.rst.txt
@@ -67,7 +67,7 @@ You can skip this section if you have already uploaded your key.
 
 After generating the gpg key, you need to upload your key to a public key server. Please refer to https://www.apache.org/dev/openpgp.html#generate-key for details.
 
-If you want to do the release on another machine, you can transfer your gpg key to that machine via the :code:`gpg --export` and :code:`gpg --import` commands.
+If you want to do the release on another machine, you can transfer your gpg key to that machine via the ``gpg --export`` and ``gpg --import`` commands.
 
 The last step is to update the KEYS file with your code signing key https://www.apache.org/dev/openpgp.html#export-public-key. Check in the changes to the TVM main branch, as well as ASF SVN,
 
@@ -96,17 +96,17 @@ To cut a release candidate, one needs to first cut a branch using selected versi
 	git branch v0.6.0
 	git push --set-upstream origin v0.6.0
 
-(*Make sure the version numbers in the source code are correct.* Run :code:`python3 version.py` to update the version.)
+(*Make sure the version numbers in the source code are correct.* Run ``python3 version.py`` to update the version.)
 
 Go to the GitHub repositories "releases" tab and click "Draft a new release",
 
-- Provide the release tag in the form of “v1.0.0.rc0” where 0 means it’s the first release candidate
+- Provide the release tag in the form of ``v1.0.0.rc0`` where 0 means it's the first release candidate. The tag must match this pattern ``v[0-9]+\.[0-9]+\.[0-9]+\.rc[0-9]`` exactly!
 - Select the commit by clicking Target: branch > Recent commits > $commit_hash
 - Copy and paste release note draft into the description box
 - Select "This is a pre-release"
 - Click "Publish release"
 
-Notice that one can still apply changes to the BRANCH after the cut, while the TAG is fixed. If any change is required for this release, a new TAG has to be created.
+Notice that one can still apply changes to the branch after the cut, while the tag is fixed. If any change is required for this release, a new tag has to be created.
 
 Remove previous release candidate (if applied),
 
@@ -145,12 +145,15 @@ Create GPG signature as well as the hash of the file,
 	shasum -a 512 apache-tvm-src-v0.6.0.rc0.tar.gz > apache-tvm-src-v0.6.0.rc0.tar.gz.sha512
 
 
-Update TVM Version on Main
---------------------------
+Update TVM Version on ``main``
+------------------------------
 
-After cutting a release candidate, make sure to update the version numbers throughout `main`. For example if we are
-releasing `v0.10.0` we want to bump the version numbers throughout the codebase from `v0.10.dev0` to `v0.11.dev0`. An
+After cutting a release candidate, make sure to update the version numbers throughout ``main``. For example if we are
+releasing ``v0.10.0`` we want to bump the version numbers throughout the codebase from ``v0.10.dev0`` to ``v0.11.dev0``. An
 example of how to do this can be found here: `https://github.com/apache/tvm/pull/12190 <https://github.com/apache/tvm/pull/12190>`_.
+Tag the commit on ``main`` immediately after the last one included in the release branch with the dev tag (e.g. ``v0.11.dev0``)
+for the next release. This tag is necessary so that the nightly packages built from ``main`` have the correct version
+number.
 
 Upload the Release Candidate
 ----------------------------
@@ -173,7 +176,7 @@ The release manager also needs to upload the artifacts to ASF SVN,
 Call a Vote on the Release Candidate
 ------------------------------------
 
-The first voting takes place on the Apache TVM developers list (dev@tvm.apache.org). To get more attention, one can create a github issue start with "[VOTE]" instead, it will be mirrored to dev@ automatically. Look at past voting threads to see how this proceeds. The email should follow this format.
+The first voting takes place on the Apache TVM developers list (dev@tvm.apache.org). To get more attention, one can create a GitHub issue start with "[VOTE]" instead, it will be mirrored to dev@ automatically. Look at past voting threads to see how this proceeds. The email should follow this format.
 
 - Provide the link to the draft of the release notes in the email
 - Provide the link to the release candidate artifacts
@@ -181,9 +184,9 @@ The first voting takes place on the Apache TVM developers list (dev@tvm.apache.o
 
 For the dev@ vote, there must be at least 3 binding +1 votes and more +1 votes than -1 votes. Once the vote is done, you should also send out a summary email with the totals, with a subject that looks something like [VOTE][RESULT] ....
 
-In ASF, votes are open "at least" 72hrs (3 days). If you don't get enough number of binding votes within that time, you cannot close the voting deadline. You need to extend it.
+In ASF, votes are open at least 72 hours (3 days). If you don't get enough number of binding votes within that time, you cannot close the voting deadline. You need to extend it.
 
-If the voting fails, the community needs to modified the release accordingly, create a new release candidate and re-run the voting process.
+If the vote fails, the community needs to modify the release accordingly: create a new release candidate and re-run the voting process.
 
 
 Post the Release
@@ -212,7 +215,7 @@ Remember to create a new release TAG (v0.6.0 in this case) on Github and remove
 Update the TVM Website
 ----------------------
 
-The website repository is located at `https://github.com/apache/tvm-site <https://github.com/apache/tvm-site>`_. Modify the download page to include the release artifacts as well as the GPG signature and SHA hash. Since TVM's docs are continually updated, upload a fixed version of the release docs. If CI has deleted the docs from the release by the time you go to update the website, you can restart the CI build for the release branch on Jenkins. See the example code below for a starting point
+The website repository is located at `https://github.com/apache/tvm-site <https://github.com/apache/tvm-site>`_. Modify the download page to include the release artifacts as well as the GPG signature and SHA hash. Since TVM's docs are continually updated, upload a fixed version of the release docs. If CI has deleted the docs from the release by the time you go to update the website, you can restart the CI build for the release branch on Jenkins. See the example code below for a starting point.
 
 .. code-block:: bash
 
diff --git a/docs/_sources/how_to/compile_models/from_darknet.rst.txt b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
index 3ebba43850..9df17dd128 100644
--- a/docs/_sources/how_to/compile_models/from_darknet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_darknet.rst.txt
@@ -315,7 +315,7 @@ The process is no different from other examples.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  13.891 seconds)
+   **Total running time of the script:** ( 1 minutes  11.203 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_darknet.py:
diff --git a/docs/_sources/how_to/compile_models/from_keras.rst.txt b/docs/_sources/how_to/compile_models/from_keras.rst.txt
index 267dd323b5..36ec053d83 100644
--- a/docs/_sources/how_to/compile_models/from_keras.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_keras.rst.txt
@@ -228,7 +228,7 @@ Look up prediction top 1 index in 1000 class synset.
  .. code-block:: none
 
     Relay top-1 id: 285, class name: Egyptian cat
-
    1/1 [==============================] - ETA: 0s
    1/1 [==============================] - 1s 995ms/step
+
    1/1 [==============================] - ETA: 0s
    1/1 [==============================] - 1s 929ms/step
     Keras top-1 id: 285, class name: Egyptian cat
 
 
diff --git a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
index dc27167b0a..683800fd40 100644
--- a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
@@ -115,7 +115,7 @@ In this section, we download a pretrained imagenet model and classify an image.
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zipdd43512b-d571-453b-ae60-5a1e206a0d73 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip30e813d4-e803-4991-a5d6-cbe7b7326dff from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
     x (1, 3, 224, 224)
 
 
diff --git a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
index c90c017b8c..8a256e72df 100644
--- a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
@@ -116,7 +116,7 @@ Load a pretrained OneFlow model and save model
  .. code-block:: none
 
     Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
-
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
     15%|#5        | 6.33M/41.5M [00:00<00:00, 55.5MB/s]
     30%|###       | 12.6M/41.5M [00:00<00:00, 60.8MB/s]
     44%|####4     | 18.4M/41.5M [00:00<00:00, 42.9MB/s]
     55%|#####5    | 22.9M/41.5M [00:00<00:00, 43.3MB/s]
     66%|######5   | 27.3M/41.5M [00:00<00:00, 40.6MB/s]
     82%|########2 | 34.1M/41.5M [00:00<00:00, 45.8MB/s]
    100%|##########| 41.5M/41.5M [00:00<00:00, 50.2MB/s]
+
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
     15%|#5        | 6.33M/41.5M [00:00<00:01, 31.2MB/s]
     22%|##2       | 9.30M/41.5M [00:00<00:01, 30.5MB/s]
     39%|###8      | 16.0M/41.5M [00:00<00:00, 36.2MB/s]
     58%|#####7    | 24.0M/41.5M [00:00<00:00, 41.5MB/s]
     77%|#######7  | 32.0M/41.5M [00:00<00:00, 49.7MB/s]
     92%|#########2| 38.3M/41.5M [00:00<00:00, 45.3MB/s]
    100%|##########| 41.5M/41.5M [00:01<00:00, 43.0MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
index 255ab4e3a8..01c34004df 100644
--- a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
@@ -98,7 +98,7 @@ Load a pretrained PyTorch model
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet18_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet18_Weights.DEFAULT` to get the most up-to-date weights.
       warnings.warn(msg)
     Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
     22%|##1       | 9.78M/44.7M [00:00<00:00, 103MB/s]
     44%|####3     | 19.6M/44.7M [00:00<00:00, 82.4MB/s]
     72%|#######1  | 32.0M/44.7M [00:00<00:00, 93.3MB/s]
     92%|#########1| 41.0M/44.7M [00:00<00:00, 86.2MB/s]
    100%|##########| 44.7M/44.7M [00:00<00:00, 91.4MB/s]
+
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
     18%|#7        | 7.99M/44.7M [00:00<00:00, 45.6MB/s]
     36%|###5      | 16.0M/44.7M [00:00<00:00, 48.6MB/s]
     54%|#####3    | 24.0M/44.7M [00:00<00:00, 58.9MB/s]
     72%|#######1  | 32.0M/44.7M [00:00<00:00, 55.0MB/s]
     89%|########8 | 39.7M/44.7M [00:00<00:00, 61.9MB/s]
    100%|##########| 44.7M/44.7M [00:00<00:00, 56.8MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
index 6e780d7599..14a1c30811 100644
--- a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
@@ -416,7 +416,7 @@ Run the corresponding model on tensorflow
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  14.359 seconds)
+   **Total running time of the script:** ( 1 minutes  9.766 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_tensorflow.py:
diff --git a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
index b18a0e42cb..7810a812cc 100644
--- a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
@@ -5,26 +5,26 @@
 
 Computation times
 =================
-**05:57.860** total execution time for **how_to_compile_models** files:
+**05:42.882** total execution time for **how_to_compile_models** files:
 
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:14.359 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 01:11.203 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)       | 01:13.891 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``) | 01:09.766 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 00:47.706 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)         | 00:45.303 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:33.305 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)       | 00:33.366 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:31.147 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)           | 00:29.678 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:28.093 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)         | 00:26.182 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:25.135 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)         | 00:24.741 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:23.170 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)       | 00:22.804 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:18.628 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)           | 00:17.462 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.425 | 0.0 MB |
+| :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)             | 00:02.377 | 0.0 MB |
 +-----------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
index 5649a6b271..601cf91774 100644
--- a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
@@ -434,7 +434,7 @@ Execute on TVM
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      16.3089      16.2773      16.5331      16.2045       0.0956   
+      16.0491      16.0210      16.2344      15.9366       0.1045   
                
 
 
diff --git a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
index 197f1c0653..2f7dbbe162 100644
--- a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
@@ -127,7 +127,7 @@ Load pre-trained maskrcnn from torchvision and do tracing
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=MaskRCNN_ResNet50_FPN_Weights.COCO_V1`. You can also use `weights=MaskRCNN_ResNet50_FPN_Weights.DEFAULT` to get the most up-to-date weights.
       warnings.warn(msg)
     Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
-
      0%|          | 0.00/170M [00:00<?, ?B/s]
      7%|7         | 12.0M/170M [00:00<00:01, 126MB/s]
     14%|#4        | 24.1M/170M [00:00<00:01, 107MB/s]
     20%|##        | 34.5M/170M [00:00<00:01, 92.1MB/s]
     28%|##7       | 47.5M/170M [00:00<00:01, 107MB/s] 
     34%|###4      | 58.0M/170M [00:00<00:01, 104MB/s]
     40%|####      | 68.2M/170M [00:00<00:01, 101MB/s]
     46%|####6     | 78.4M/170M [00:00<00:00, 103MB/s]
     52%|#####1    | 88.3M/170M [00:00<00:00, 101MB/s]
     58%|#####7    | 98.0M/170M [00:01<00:00, 95.8MB/s]
     64%|######4   | 109M/170M [00:01<00:00, 103MB/s]  
     70%|#######   | 119M/170M [00:01<00:00, 102MB/s]
     76%|#######6  | 129M/170M [00:01<00:00, 102MB/s]
     82%|########1 | 139M/170M [00:01<00:00, 93.9MB/s]
     89%|########8 | 151M/170M [00:01<00:00, 103MB/s] 
     95%|#########4| 161M/170M [00:01<00:00, 102MB/s]
    100%|##########| 170M/170M [00:01<00:00, 101MB/s]
+
      0%|          | 0.00/170M [00:00<?, ?B/s]
      5%|4         | 7.99M/170M [00:00<00:02, 64.4MB/s]
     14%|#4        | 24.0M/170M [00:00<00:01, 103MB/s] 
     24%|##3       | 40.0M/170M [00:00<00:01, 112MB/s]
     33%|###2      | 55.8M/170M [00:00<00:00, 130MB/s]
     40%|####      | 68.5M/170M [00:00<00:00, 118MB/s]
     47%|####7     | 80.1M/170M [00:00<00:00, 94.8MB/s]
     56%|#####5    | 94.9M/170M [00:00<00:00, 110MB/s] 
     63%|######2   | 106M/170M [00:01<00:00, 101MB/s] 
     69%|######8   | 117M/170M [00:01<00:00, 101MB/s]
     75%|#######4  | 127M/170M [00:01<00:00, 95.9MB/s]
     80%|########  | 136M/170M [00:01<00:00, 83.1MB/s]
     88%|########7 | 149M/170M [00:01<00:00, 95.0MB/s]
     93%|#########3| 158M/170M [00:01<00:00, 95.5MB/s]
    100%|#########9| 169M/170M [00:01<00:00, 101MB/s] 
    100%|##########| 170M/170M [00:01<00:00, 101MB/s]
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torch/nn/functional.py:3897: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
       for i in range(dim)
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torchvision/models/detection/anchor_utils.py:124: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -296,7 +296,7 @@ Get boxes with score larger than 0.9
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 3 minutes  27.056 seconds)
+   **Total running time of the script:** ( 3 minutes  11.260 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_object_detection_pytorch.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
index 8fc3210a11..430aba92d2 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
@@ -236,7 +236,7 @@ training. Other models require a full post training calibration.
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=MobileNet_V2_Weights.IMAGENET1K_V1`. You can also use `weights=MobileNet_V2_Weights.DEFAULT` to get the most up-to-date weights.
       warnings.warn(msg)
     Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
-
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
     91%|######### | 12.3M/13.6M [00:00<00:00, 129MB/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 125MB/s]
+
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
     90%|########9 | 12.2M/13.6M [00:00<00:00, 81.9MB/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 87.5MB/s]
 
 
 
@@ -418,7 +418,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      90.4693      90.3698      97.0182      90.1397       0.6845   
+      90.1403      90.0314      93.9069      89.8491       0.4606   
                
 
 
@@ -467,7 +467,7 @@ TODO
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  16.350 seconds)
+   **Total running time of the script:** ( 1 minutes  5.223 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
index 16ccac3dce..3febb2e44d 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
@@ -432,7 +432,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      121.7389     121.6763     123.5499     120.7672      0.4968   
+      117.2063     117.1382     118.6567     115.6592      0.6188   
                
 
 
@@ -469,7 +469,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  30.045 seconds)
+   **Total running time of the script:** ( 2 minutes  27.522 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized_tflite.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
index 2edc3962ba..874803a76a 100644
--- a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
@@ -253,7 +253,7 @@ We create a Relay VM to build and execute the model.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  37.909 seconds)
+   **Total running time of the script:** ( 1 minutes  33.976 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_quantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
index 7cb2270ef7..65b29c1e56 100644
--- a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
@@ -166,7 +166,7 @@ Convert and compile model for CPU.
             data: None
       input_sym_arg_type = in_param.infer_type()[0]
     Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
-
      0%|          | 0/132723 [00:00<?, ?KB/s]
      3%|2         | 3325/132723 [00:00<00:03, 33245.91KB/s]
      8%|8         | 11015/132723 [00:00<00:02, 58921.21KB/s]
     13%|#2        | 16908/132723 [00:00<00:01, 58822.34KB/s]
     19%|#8        | 24725/132723 [00:00<00:01, 66450.81KB/s]
     25%|##4       | 32756/132723 [00:00<00:01, 71438.14KB/s]
     31%|###       | 40769/132723 [00:00<00:01, 74389.66KB/s]
     37%|###6      | 48797/132723 [00:00<00:01, 76310.49KB/s]
     43%|####2     | 56429/132723 [00:00<00:01, 73766.86KB/s]
     49%|####8     | 64476/132723 [00:00<00:00, 75811.18KB/s]
     55%|#####4    | 72473/132723 [00:01<00:00, 77072.91KB/s]
     61%|######    | 80540/132723 [00:01<00:00, 78159.04KB/s]
     67%|######6   | 88578/132723 [00:01<00:00, 78826.03KB/s]
     73%|#######2  | 96469/132723 [00:01<00:00, 77646.95KB/s]
     79%|#######8  | 104513/132723 [00:01<00:00, 78476.18KB/s]
     85%|########4 | 112563/132723 [00:01<00:00, 79076.44KB/s]
     91%|#########
  | 120571/132723 [00:01<00:00, 79371.77KB/s]
     97%|#########6| 128553/132723 [00:01<00:00, 79504.81KB/s]
    100%|##########| 132723/132723 [00:01<00:00, 75020.34KB/s]
+
      0%|          | 0/132723 [00:00<?, ?KB/s]
      4%|4         | 5865/132723 [00:00<00:02, 58642.65KB/s]
     11%|#         | 14343/132723 [00:00<00:01, 74015.15KB/s]
     17%|#7        | 22921/132723 [00:00<00:01, 79384.37KB/s]
     24%|##3       | 31499/132723 [00:00<00:01, 81906.82KB/s]
     30%|###       | 40181/132723 [00:00<00:01, 83677.13KB/s]
     37%|###6      | 48729/132723 [00:00<00:00, 84285.80KB/s]
     43%|####3     | 57389/132723 [00:00<00:00, 85041.02KB/s]
     50%|####9     | 66044/132723 [00:00<00:00, 85518.49KB/s]
     56%|#####6    | 74666/132723 [00:00<00:00, 85731.42KB/s]
     63%|######2   | 83240/132723 [00:01<00:00, 84079.34KB/s]
     69%|######9   | 91874/132723 [00:01<00:00, 84758.89KB/s]
     76%|#######5  | 100356/132723 [00:01<00:00, 78778.61KB/s]
     82%|########2 | 109027/132723 [00:01<00:00, 81042.84KB/s]
     88%|########8 | 117203/132723 [00:01<00:00, 65725.00KB/s]
     95%|#########4| 125791/132723 [00:01<00:00, 70780.02KB/s]
    100%|#######
 ###| 132723/132723 [00:01<00:00, 78266.41KB/s]
 
 
 
@@ -242,7 +242,7 @@ Display result
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 3 minutes  9.754 seconds)
+   **Total running time of the script:** ( 2 minutes  57.338 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_ssd_gluoncv.py:
diff --git a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
index e3f8244968..3deeae93c9 100644
--- a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
@@ -5,24 +5,24 @@
 
 Computation times
 =================
-**13:30.469** total execution time for **how_to_deploy_models** files:
+**12:40.661** total execution time for **how_to_deploy_models** files:
 
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 03:27.056 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``) | 03:11.260 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 03:09.754 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)                           | 02:57.338 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 02:30.045 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)           | 02:27.522 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:37.909 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)                               | 01:33.976 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:16.350 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)                         | 01:05.223 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:37.348 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)                 | 00:35.402 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_nano.py` (``deploy_model_on_nano.py``)                       | 00:26.329 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_nano.py` (``deploy_model_on_nano.py``)                       | 00:25.209 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:25.671 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)                       | 00:24.724 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)                                     | 00:00.007 | 0.0 MB |
+| :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)                                     | 00:00.006 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
index d931da2b58..0c03b4991e 100644
--- a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
@@ -472,7 +472,7 @@ First let us define two helper functions to get the mobilenet model and a cat im
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip3d1403ad-5b47-4a55-84bd-6f0dae493bdf from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipee89f252-ece6-4f02-9764-1380ae97e3fb from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 
 
 
diff --git a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
index 5fda58fd36..823a2fa724 100644
--- a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**00:49.285** total execution time for **how_to_extend_tvm** files:
+**00:46.825** total execution time for **how_to_extend_tvm** files:
 
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:45.681 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``) | 00:43.462 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.519 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)           | 00:02.347 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:01.077 | 0.0 MB |
+| :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)                     | 00:01.007 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)       | 00:00.008 | 0.0 MB |
 +-------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
index e6968fbf51..598940fd11 100644
--- a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
@@ -216,10 +216,10 @@ profile the execution time of each passes.
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 7636us [7636us] (47.04%; 47.04%)
-    FoldScaleAxis: 8598us [8us] (52.96%; 52.96%)
-            FoldConstant: 8589us [1731us] (52.91%; 99.91%)
-                    InferType: 6858us [6858us] (42.25%; 79.84%)
+    InferType: 7103us [7103us] (46.13%; 46.13%)
+    FoldScaleAxis: 8295us [6us] (53.87%; 53.87%)
+            FoldConstant: 8289us [1675us] (53.83%; 99.93%)
+                    InferType: 6614us [6614us] (42.95%; 79.80%)
 
 
 
@@ -258,10 +258,10 @@ Refer to following sections and :py:func:`tvm.instrument.pass_instrument` for th
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 7055us [7055us] (45.37%; 45.37%)
-    FoldScaleAxis: 8496us [8us] (54.63%; 54.63%)
-            FoldConstant: 8489us [1739us] (54.58%; 99.91%)
-                    InferType: 6749us [6749us] (43.40%; 79.51%)
+    InferType: 6659us [6659us] (45.27%; 45.27%)
+    FoldScaleAxis: 8050us [5us] (54.73%; 54.73%)
+            FoldConstant: 8045us [1685us] (54.69%; 99.94%)
+                    InferType: 6360us [6360us] (43.24%; 79.06%)
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
index 9fce9ab70e..ee58a1e379 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
@@ -340,7 +340,7 @@ latency of convolution.
 
  .. code-block:: none
 
-    Convolution: 34.209918 ms
+    Convolution: 46.161792 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
index 93ff2e1ade..9b46399e0d 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
@@ -659,7 +659,7 @@ be able to run on our build server
 
  .. code-block:: none
 
-    conv2d with tensor core: 13.365565 ms
+    conv2d with tensor core: 12.319292 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
index 6bbb09139e..a5a2d2c708 100644
--- a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
@@ -143,8 +143,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 
  .. code-block:: none
 
-    Numpy running time: 0.020019
-    Baseline: 3.320755
+    Numpy running time: 0.018783
+    Baseline: 3.524702
 
 
 
@@ -239,7 +239,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 
  .. code-block:: none
 
-    Opt1: 0.330129
+    Opt1: 0.290261
 
 
 
@@ -342,7 +342,7 @@ In this tutorial, we chose to vectorize the inner loop row data since it is cach
 
  .. code-block:: none
 
-    Opt2: 0.355017
+    Opt2: 0.330394
 
 
 
@@ -438,7 +438,7 @@ the access pattern for A matrix is more cache friendly.
 
  .. code-block:: none
 
-    Opt3: 0.131375
+    Opt3: 0.114227
 
 
 
@@ -563,7 +563,7 @@ flattening.
 
  .. code-block:: none
 
-    Opt4: 0.111834
+    Opt4: 0.108937
 
 
 
@@ -685,7 +685,7 @@ write to C when all the block results are ready.
 
  .. code-block:: none
 
-    Opt5: 0.112285
+    Opt5: 0.115411
 
 
 
@@ -810,7 +810,7 @@ Furthermore, we can also utilize multi-core processors to do the thread-level pa
 
  .. code-block:: none
 
-    Opt6: 0.148514
+    Opt6: 0.153141
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
index 1f5f4e542c..d72561bdfa 100644
--- a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
@@ -5,12 +5,12 @@
 
 Computation times
 =================
-**00:35.640** total execution time for **how_to_optimize_operators** files:
+**00:35.473** total execution time for **how_to_optimize_operators** files:
 
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:33.094 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)                       | 00:32.828 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.489 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``) | 00:01.516 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.057 | 0.0 MB |
+| :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)             | 00:01.129 | 0.0 MB |
 +-----------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
index 14a29c4359..8f8629ecd1 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
@@ -5,18 +5,18 @@
 
 Computation times
 =================
-**09:29.896** total execution time for **how_to_tune_with_autoscheduler** files:
+**09:11.761** total execution time for **how_to_tune_with_autoscheduler** files:
 
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 05:46.298 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``) | 05:45.185 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:35.123 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)             | 01:31.589 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 01:04.854 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)           | 01:02.315 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:39.686 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)               | 00:29.958 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:12.324 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)             | 00:11.800 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:11.611 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)           | 00:10.915 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
index f44b9bfa5f..bf8c2b868f 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
@@ -771,7 +771,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 0.356 ms
+    Execution time of this operator: 0.353 ms
 
 
 
@@ -1378,7 +1378,7 @@ In the example below we resume the status and do more 5 trials.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 5 minutes  46.298 seconds)
+   **Total running time of the script:** ( 5 minutes  45.185 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
index c028a77bd9..2715f25e01 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
@@ -643,7 +643,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-       8.1887       8.1887       8.1938       8.1837       0.0041   
+       8.2103       8.2062       8.2239       8.2007       0.0099   
                
 
 
@@ -671,7 +671,7 @@ Other Tips
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  4.854 seconds)
+   **Total running time of the script:** ( 1 minutes  2.315 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
index d90c496939..9680e5cf79 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
@@ -662,7 +662,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      765.1206     764.9800     765.5755     764.8065      0.3293   
+      755.0235     755.1044     755.4110     754.5552      0.3540   
                
 
 
@@ -690,7 +690,7 @@ Other Tips
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  35.123 seconds)
+   **Total running time of the script:** ( 1 minutes  31.589 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_x86.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
index a96cc66baf..58f79524ae 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
@@ -386,121 +386,76 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
                  placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
                  compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
       buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-      preflattened_buffer_map = {placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_9: placeholder_16: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_5: placeholder_18: Buffer(placeholder_10, float32, [128, 256], []), placeholder_8: placeholder_19: Buffer(placeholder_13, int32, [33], [])} {
+      preflattened_buffer_map = {placeholder_6: placeholder_15: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_7: placeholder_16: Buffer(placeholder_12, int32, [4916], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_18: Buffer(placeholder_10, float32, [128, 256], []), placeholder_9: placeholder_19: Buffer(placeholder_14, float32, [128, 512], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], [])} {
       for (i0.outer.i1.outer.fused: int32, 0, 256) "parallel" {
-        allocate(compute_4: Pointer(global float32), float32, [256]), storage_scope = global {
-          for (i.outer.inner: int32, 0, 4) {
-            for (nb_j.inner: int32, 0, 2) {
-              let cse_var_2: int32 = ((i.outer.inner*64) + (nb_j.inner*16))
-              let cse_var_1: int32 = ((floormod(i0.outer.i1.outer.fused, 16)*2) + nb_j.inner)
+        allocate(compute_4: Pointer(global float32), float32, [512]), storage_scope = global {
+          for (i.inner.init: int32, 0, 32) {
+            let cse_var_1: int32 = (i.inner.init*16)
+             {
+              compute_5: Buffer(compute_4, float32, [512], [])[cse_var_1] = 0f32
+              compute_5[(cse_var_1 + 1)] = 0f32
+              compute_5[(cse_var_1 + 2)] = 0f32
+              compute_5[(cse_var_1 + 3)] = 0f32
+              compute_5[(cse_var_1 + 4)] = 0f32
+              compute_5[(cse_var_1 + 5)] = 0f32
+              compute_5[(cse_var_1 + 6)] = 0f32
+              compute_5[(cse_var_1 + 7)] = 0f32
+              compute_5[(cse_var_1 + 8)] = 0f32
+              compute_5[(cse_var_1 + 9)] = 0f32
+              compute_5[(cse_var_1 + 10)] = 0f32
+              compute_5[(cse_var_1 + 11)] = 0f32
+              compute_5[(cse_var_1 + 12)] = 0f32
+              compute_5[(cse_var_1 + 13)] = 0f32
+              compute_5[(cse_var_1 + 14)] = 0f32
+              compute_5[(cse_var_1 + 15)] = 0f32
+            }
+          }
+          for (elem_idx: int32, 0, let cse_var_2: int32 = floordiv(floormod(i0.outer.i1.outer.fused, 64), 2) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
+            for (i.inner: int32, 0, 32) {
+              let cse_var_21: int32 = (i.inner*16)
+              let cse_var_20: int32 = (elem_idx*16)
+              let cse_var_19: int32 = floordiv(floormod(i0.outer.i1.outer.fused, 64), 2)
+              let cse_var_18: int32 = (cse_var_21 + 9)
+              let cse_var_17: int32 = (cse_var_21 + 8)
+              let cse_var_16: int32 = (cse_var_21 + 7)
+              let cse_var_15: int32 = (cse_var_21 + 6)
+              let cse_var_14: int32 = (cse_var_21 + 5)
+              let cse_var_13: int32 = (cse_var_21 + 4)
+              let cse_var_12: int32 = (cse_var_21 + 3)
+              let cse_var_11: int32 = (cse_var_21 + 2)
+              let cse_var_10: int32 = (cse_var_21 + 15)
+              let cse_var_9: int32 = (cse_var_21 + 14)
+              let cse_var_8: int32 = (cse_var_21 + 13)
+              let cse_var_7: int32 = (cse_var_21 + 12)
+              let cse_var_6: int32 = (cse_var_21 + 11)
+              let cse_var_5: int32 = (cse_var_21 + 10)
+              let cse_var_4: int32 = (cse_var_21 + 1)
+              let cse_var_3: int32 = ((floordiv(i0.outer.i1.outer.fused, 64)*8192) + (i.inner*256))
                {
-                compute_5: Buffer(compute_4, float32, [256], [])[cse_var_2] = 0f32
-                compute_5[(cse_var_2 + 1)] = 0f32
-                compute_5[(cse_var_2 + 2)] = 0f32
-                compute_5[(cse_var_2 + 3)] = 0f32
-                compute_5[(cse_var_2 + 4)] = 0f32
-                compute_5[(cse_var_2 + 5)] = 0f32
-                compute_5[(cse_var_2 + 6)] = 0f32
-                compute_5[(cse_var_2 + 7)] = 0f32
-                compute_5[(cse_var_2 + 8)] = 0f32
-                compute_5[(cse_var_2 + 9)] = 0f32
-                compute_5[(cse_var_2 + 10)] = 0f32
-                compute_5[(cse_var_2 + 11)] = 0f32
-                compute_5[(cse_var_2 + 12)] = 0f32
-                compute_5[(cse_var_2 + 13)] = 0f32
-                compute_5[(cse_var_2 + 14)] = 0f32
-                compute_5[(cse_var_2 + 15)] = 0f32
-                compute_5[(cse_var_2 + 32)] = 0f32
-                compute_5[(cse_var_2 + 33)] = 0f32
-                compute_5[(cse_var_2 + 34)] = 0f32
-                compute_5[(cse_var_2 + 35)] = 0f32
-                compute_5[(cse_var_2 + 36)] = 0f32
-                compute_5[(cse_var_2 + 37)] = 0f32
-                compute_5[(cse_var_2 + 38)] = 0f32
-                compute_5[(cse_var_2 + 39)] = 0f32
-                compute_5[(cse_var_2 + 40)] = 0f32
-                compute_5[(cse_var_2 + 41)] = 0f32
-                compute_5[(cse_var_2 + 42)] = 0f32
-                compute_5[(cse_var_2 + 43)] = 0f32
-                compute_5[(cse_var_2 + 44)] = 0f32
-                compute_5[(cse_var_2 + 45)] = 0f32
-                compute_5[(cse_var_2 + 46)] = 0f32
-                compute_5[(cse_var_2 + 47)] = 0f32
-                for (elem_idx: int32, 0, (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
-                  let cse_var_35: int32 = (elem_idx*16)
-                  let cse_var_34: int32 = (cse_var_2 + 9)
-                  let cse_var_33: int32 = (cse_var_2 + 8)
-                  let cse_var_32: int32 = (cse_var_2 + 7)
-                  let cse_var_31: int32 = (cse_var_2 + 6)
-                  let cse_var_30: int32 = (cse_var_2 + 5)
-                  let cse_var_29: int32 = (cse_var_2 + 47)
-                  let cse_var_28: int32 = (cse_var_2 + 46)
-                  let cse_var_27: int32 = (cse_var_2 + 45)
-                  let cse_var_26: int32 = (cse_var_2 + 44)
-                  let cse_var_25: int32 = (cse_var_2 + 43)
-                  let cse_var_24: int32 = (cse_var_2 + 42)
-                  let cse_var_23: int32 = (cse_var_2 + 41)
-                  let cse_var_22: int32 = (cse_var_2 + 40)
-                  let cse_var_21: int32 = (cse_var_2 + 4)
-                  let cse_var_20: int32 = (cse_var_2 + 39)
-                  let cse_var_19: int32 = (cse_var_2 + 38)
-                  let cse_var_18: int32 = (cse_var_2 + 37)
-                  let cse_var_17: int32 = (cse_var_2 + 36)
-                  let cse_var_16: int32 = (cse_var_2 + 35)
-                  let cse_var_15: int32 = (cse_var_2 + 34)
-                  let cse_var_14: int32 = (cse_var_2 + 33)
-                  let cse_var_13: int32 = (cse_var_2 + 32)
-                  let cse_var_12: int32 = (cse_var_2 + 3)
-                  let cse_var_11: int32 = (cse_var_2 + 2)
-                  let cse_var_10: int32 = (cse_var_2 + 15)
-                  let cse_var_9: int32 = (cse_var_2 + 14)
-                  let cse_var_8: int32 = (cse_var_2 + 13)
-                  let cse_var_7: int32 = (cse_var_2 + 12)
-                  let cse_var_6: int32 = (cse_var_2 + 11)
-                  let cse_var_5: int32 = (cse_var_2 + 10)
-                  let cse_var_4: int32 = (cse_var_2 + 1)
-                  let cse_var_3: int32 = ((floordiv(i0.outer.i1.outer.fused, 16)*2048) + (i.outer.inner*512))
-                   {
-                    compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[((placeholder_3[cse_var_1]*16) + cse_var_35)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 1)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 2)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 3)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_21] = (compute_5[cse_var_21] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 4)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_30] = (compute_5[cse_var_30] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 5)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_31] = (compute_5[cse_var_31] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 6)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_32] = (compute_5[cse_var_32] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 7)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_33] = (compute_5[cse_var_33] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 8)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_34] = (compute_5[cse_var_34] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 9)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 10)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 11)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 12)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 13)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 14)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 15)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                    compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[((placeholder_3[cse_var_1]*16) + cse_var_35)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 1)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 2)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 3)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 4)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 5)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_19] = (compute_5[cse_var_19] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 6)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_20] = (compute_5[cse_var_20] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 7)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_22] = (compute_5[cse_var_22] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 8)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_23] = (compute_5[cse_var_23] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 9)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_24] = (compute_5[cse_var_24] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 10)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_25] = (compute_5[cse_var_25] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 11)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_26] = (compute_5[cse_var_26] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 12)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_27] = (compute_5[cse_var_27] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 13)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_28] = (compute_5[cse_var_28] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 14)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                    compute_5[cse_var_29] = (compute_5[cse_var_29] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 15)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                  }
-                }
+                compute_5[cse_var_21] = (compute_5[cse_var_21] + (placeholder_1[((placeholder_3[cse_var_19]*16) + cse_var_20)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 1)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 2)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 3)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 4)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 5)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 6)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 7)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 8)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 9)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 10)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 11)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 12)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 13)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 14)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+                compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 15)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
               }
             }
           }
-          for (i0.inner: int32, 0, 8) {
-            let cse_var_36: int32 = (((floordiv(i0.outer.i1.outer.fused, 16)*4096) + (i0.inner*512)) + (floormod(i0.outer.i1.outer.fused, 16)*32))
-            compute[ramp(cse_var_36, 1, 32)] = max((compute_5[ramp((i0.inner*32), 1, 32)] + placeholder_4[ramp(cse_var_36, 1, 32)]), broadcast(0f32, 32))
+          for (i0.inner: int32, 0, 32) {
+            let cse_var_23: int32 = floormod(i0.outer.i1.outer.fused, 64)
+            let cse_var_24: int32 = (cse_var_23*8)
+            let cse_var_22: int32 = (((floordiv(i0.outer.i1.outer.fused, 64)*16384) + (i0.inner*512)) + cse_var_24)
+            compute[ramp(cse_var_22, 1, 8)] = max((compute_5[ramp((((i0.inner*16) + cse_var_24) - (floordiv(cse_var_23, 2)*16)), 1, 8)] + placeholder_4[ramp(cse_var_22, 1, 8)]), broadcast(0f32, 8))
           }
         }
       }
@@ -556,7 +511,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 3.544 ms
+    Execution time of this operator: 3.614 ms
 
 
 
diff --git a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
index 9868b752c0..be5cd4d240 100644
--- a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
@@ -5,16 +5,16 @@
 
 Computation times
 =================
-**00:56.099** total execution time for **how_to_tune_with_autotvm** files:
+**00:39.161** total execution time for **how_to_tune_with_autotvm** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:56.062 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)           | 00:39.126 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.021 | 0.0 MB |
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)               | 00:00.020 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)             | 00:00.005 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)               | 00:00.005 | 0.0 MB |
-+--------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``) | 00:00.005 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
+| :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)               | 00:00.005 | 0.0 MB |
++--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
index 9a96aa7d94..cf1d143edd 100644
--- a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
@@ -387,10 +387,9 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 16, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2821466
-    No: 2   GFLOPS: 6.97/6.97       result: MeasureResult(costs=(0.03322158425,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.620582342147827, timestamp=1668549250.7313094)       [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 32, 1]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,8345205
-    No: 3   GFLOPS: 12.53/12.53     result: MeasureResult(costs=(0.018477077333333335,), error_no=MeasureErrorNo.NO_ERROR, all_cost=10.270890235900879, timestamp=1668549252.4500968)       [('tile_f', [-1, 8, 8, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10272490
-    No: 4   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 4, 1]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3954303
+    No: 2   GFLOPS: 154.18/154.18   result: MeasureResult(costs=(0.0015014968333333333,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.9308342933654785, timestamp=1668556463.7666306)      [('tile_f', [-1, 8, 8, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 2, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,264690
+    No: 3   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -512,161 +511,132 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16, 8]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6692404
-    No: 5   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
-      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 738, in __call__
-        yield remote, remote.load_module(os.path.split(build_result.filename)[1])
-      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 702, in run_through_rpc
-        costs = time_f(*args).results
-      File "/workspace/python/tvm/runtime/module.py", line 357, in evaluator
-        blob = feval(*args)
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 1, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 16]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,903288
+    No: 4   GFLOPS: 150.94/154.18   result: MeasureResult(costs=(0.0015337318636363638,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.028738498687744, timestamp=1668556465.3167734)       [('tile_f', [-1, 1, 2, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4552304
+    No: 5   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
+      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
+        func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
+      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
+        func = build(s, args, target_host=task.target_host, runtime=runtime)
+      File "/workspace/python/tvm/driver/build_module.py", line 227, in build
+        input_mod = lower(inputs, args, name=name, binds=binds)
+      File "/workspace/python/tvm/driver/build_module.py", line 134, in lower
+        return ffi.lower_schedule(inp, args, name, binds, simple_mode)
       File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
-      File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
-      File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
+      File "tvm/_ffi/_cython/./packed_func.pxi", line 276, in tvm._ffi._cy3.core.FuncCall
       File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
     tvm._ffi.base.TVMError: Traceback (most recent call last):
-      4: TVMFuncCall
+      24: TVMFuncCall
             at ../src/runtime/c_runtime_api.cc:477
-      3: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+      23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
             at ../include/tvm/runtime/packed_func.h:1217
-      2: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
-            at ../src/runtime/rpc/rpc_module.cc:129
-      1: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
-            at ../src/runtime/rpc/rpc_endpoint.cc:1012
-      0: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)
-            at ../src/runtime/rpc/rpc_endpoint.cc:804
-      File "../src/runtime/rpc/rpc_endpoint.cc", line 804
-    TVMError: 
-    ---------------------------------------------------------------
-    An error occurred during the execution of TVM.
-    For more information, please see: https://tvm.apache.org/docs/errors.html
-    ---------------------------------------------------------------
-      Check failed: (code == RPCCode::kReturn) is false: code=kShutdown
-
-    During handling of the above exception, another exception occurred:
-
-    Traceback (most recent call last):
-      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 702, in run_through_rpc
-        costs = time_f(*args).results
-      File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
-        self.gen.throw(type, value, traceback)
-      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 742, in __call__
-        remote.remove(build_result.filename)
-      File "/workspace/python/tvm/rpc/client.py", line 144, in remove
-        self._remote_funcs["remove"] = self.get_function("tvm.rpc.server.remove")
-      File "/workspace/python/tvm/rpc/client.py", line 72, in get_function
-        return self._sess.get_function(name)
-      File "/workspace/python/tvm/runtime/module.py", line 171, in get_function
-        self.handle, c_str(name), ctypes.c_int(query_imports), ctypes.byref(ret_handle)
-      File "/workspace/python/tvm/_ffi/base.py", line 348, in check_call
-        raise get_last_ffi_error()
-    tvm._ffi.base.TVMError: Traceback (most recent call last):
-      52: 0xffffffffffffffff
-      51: _start
-      50: __libc_start_main
-      49: _Py_UnixMain
-      48: 0x0000000000650da0
-      47: 0x0000000000650afa
-      46: _PyFunction_FastCallDict
-      45: _PyEval_EvalCodeWithName
-      44: _PyEval_EvalFrameDefault
-      43: _PyFunction_FastCallKeywords
-      42: _PyEval_EvalCodeWithName
-      41: _PyEval_EvalFrameDefault
-      40: _PyMethodDef_RawFastCallKeywords
-      39: 0x0000000000546369
-      38: _PyEval_EvalCodeWithName
-      37: _PyEval_EvalFrameDefault
-      36: _PyFunction_FastCallKeywords
-      35: _PyEval_EvalCodeWithName
-      34: _PyEval_EvalFrameDefault
-      33: _PyFunction_FastCallDict
-      32: _PyEval_EvalCodeWithName
-      31: _PyEval_EvalFrameDefault
-      30: _PyObject_FastCallDict
-      29: 0x00000000004c06e1
-      28: _PyFunction_FastCallDict
-      27: _PyEval_EvalFrameDefault
-      26: _PyMethodDescr_FastCallKeywords
-      25: 0x00000000005dcb58
-      24: 0x00000000005dc83f
-      23: 0x00000000004ba127
-      22: _PyEval_EvalFrameDefault
-      21: _PyFunction_FastCallKeywords
-      20: _PyEval_EvalFrameDefault
-      19: _PyFunction_FastCallKeywords
-      18: _PyEval_EvalFrameDefault
-      17: _PyFunction_FastCallKeywords
-      16: _PyEval_EvalCodeWithName
-      15: _PyEval_EvalFrameDefault
-      14: 0x0000000000537c30
-      13: _PyObject_FastCallKeywords
-      12: 0x00007fe2cb1f3fa2
-      11: _ctypes_callproc
-      10: ffi_call
-      9: ffi_call_unix64
-      8: TVMModGetFunction
-            at ../src/runtime/c_runtime_api.cc:408
-      7: tvm::runtime::ModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
-            at ../src/runtime/module.cc:66
-      6: tvm::runtime::RPCModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)
-            at ../src/runtime/rpc/rpc_module.cc:185
-      5: tvm::runtime::RPCClientSession::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
-            at ../src/runtime/rpc/rpc_endpoint.cc:1007
-      4: tvm::runtime::TVMRetValue tvm::runtime::RPCEndpoint::SysCallRemote<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(tvm::runtime::RPCCode, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
-            at ../src/runtime/rpc/rpc_endpoint.h:223
-      3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const
+      22: Call
+            at ../include/tvm/runtime/packed_func.h:1213
+      21: operator()
+            at ../include/tvm/runtime/packed_func.h:1731
+      20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
+            at ../include/tvm/runtime/packed_func.h:1671
+      19: run<>
+            at ../include/tvm/runtime/packed_func.h:1631
+      18: run<tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1646
+      13: operator()
+            at ../src/driver/driver_api.cc:388
+      12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, tvm::GlobalVarSupply, bool)
+            at ../src/driver/driver_api.cc:374
+      11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
+            at ../src/driver/driver_api.cc:269
+      10: tvm::transform::Pass::operator()(tvm::IRModule) const
+            at ../src/ir/transform.cc:258
+      9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:274
+      8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:453
+      7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:274
+      6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/tir/ir/transform.cc:100
+      5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
+            at ../include/tvm/runtime/packed_func.h:1750
+      4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
+            at ../include/tvm/runtime/packed_func.h:1694
+      3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
             at ../include/tvm/runtime/packed_func.h:1618
       2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
             at ../include/tvm/runtime/packed_func.h:1217
       1: Call
             at ../include/tvm/runtime/packed_func.h:1213
       0: operator()
-            at ../src/runtime/rpc/rpc_endpoint.cc:684
-      File "../src/runtime/rpc/rpc_endpoint.cc", line 684
-    TVMError: 
-    ---------------------------------------------------------------
-    An error occurred during the execution of TVM.
-    For more information, please see: https://tvm.apache.org/docs/errors.html
-    ---------------------------------------------------------------
-      Check failed: (code == RPCCode::kReturn) is false: code=1
+            at ../src/runtime/c_runtime_api.cc:534
+      File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
+      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
+        raise InstantiationError("Skipped because of invalid gpu kernel")
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel
 
     Traceback (most recent call last):
-      52: 0xffffffffffffffff
-      51: _start
-      50: __libc_start_main
-      49: _Py_UnixMain
-      48: 0x0000000000650da0
-      47: 0x0000000000650afa
-      46: _PyFunction_FastCallDict
-      45: _PyEval_EvalCodeWithName
-      44: _PyEval_EvalFrameDefault
-      43: _PyFunction_FastCallKeywords
-      42: _PyEval_EvalCodeWithName
-      41: _PyEval_EvalFrameDefault
-      40: _PyMethodDef_RawFastCallKeywords
-      39: 0x0000000000546369
-      38: _PyEval_EvalCodeWithName
-      37: _PyEval_EvalFrameDefault
-      36: _PyFunction_FastCallKeywords
-      35: _PyEval_EvalCodeWithName
-      34: _PyEval_EvalFrameDefault
-      33: _PyFunction_FastCallDict
-      32: _PyEval_EvalCodeWithName
-      31: _PyEval_EvalFrameDefault
-      30: _PyObject_FastCallDict
-      29: 0x00000000004c06e1
-      28: _PyFunction_FastCallDict
-      27: _PyEval_EvalFrameDefault
-      26: _PyMethodDescr_FastCallKeywords
-      25: 0x00000000005dcb58
-      24: 0x00000000005dc83f
-      23: 0x00000000004ba127
-      22: _PyEval_EvalFrameDefault
-      21: _PyFunction_FastCallKeywords
-      20: _PyEval_EvalFrameDefault
-      19: _PyFunction_FastCall      [('tile_f', [-1, 8, 1, 16]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 8, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6790027
-    No: 6   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+      24: TVMFuncCall
+            at ../src/runtime/c_runtime_api.cc:477
+      23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+            at ../include/tvm/runtime/packed_func.h:1217
+      22: Call
+            at ../include/tvm/runtime/packed_func.h:1213
+      21: operator()
+            at ../include/tvm/runtime/packed_func.h:1731
+      20: unpack_call<tvm::IRModule, 5, tvm::<lambda(tvm::te::Schedule, const tvm::runtime::Array<tvm::runtime::ObjectRef>&, const tvm::runtime::String&, const tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer>&, bool)> >
+            at ../include/tvm/runtime/packed_func.h:1671
+      19: run<>
+            at ../include/tvm/runtime/packed_func.h:1631
+      18: run<tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      17: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      16: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      15: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1631
+      14: run<tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_>
+            at ../include/tvm/runtime/packed_func.h:1646
+      13: operator()
+            at ../src/driver/driver_api.cc:388
+      12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, tvm::GlobalVarSupply, bool)
+            at ../src/driver/driver_api.cc:374
+      11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
+            at ../src/driver/driver_api.cc:269
+      10: tvm::transform::Pass::operator()(tvm::IRModule) const
+            at ../src/ir/transform.cc:258
+      9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:274
+      8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:453
+      7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/ir/transform.cc:274
+      6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
+            at ../src/tir/ir/transform.cc:100
+      5: tvm::runtime::TypedPackedFunc<tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)>::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
+            at ../include/tvm/runtime/packed_func.h:1750
+      4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher<tvm::tir::PrimFunc>::run<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::runtime::PackedFunc const&, tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&)
+            at ../include/tvm/runtime/packed_func.h:1694
+      3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const
+            at ../include/tvm/runtime/packed_func.h:1618
+      2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+            at ../include/tvm/runtime/packed_func.h:1217
+      1: Call
+            at ../include/tvm/runtime/packed_func.h:1213
+      0: operator()
+            at ../src/runtime/c_runtime_api.cc:534
+      File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
+      File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
+        raise InstantiationError("Skipped because of invalid gpu kernel")
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 4, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 256]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,8512482
+    No: 6   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -788,8 +758,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 32, 1]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 2, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2083444
-    No: 7   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 256, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9782088
+    No: 7   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -911,8 +881,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 1, 32]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7327507
-    No: 8   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 32, 16, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 4]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5304899
+    No: 8   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1034,8 +1004,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10433677
-    No: 9   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 8, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 16, 16]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4587081
+    No: 9   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1157,8 +1127,10 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 256, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,731288
-    No: 10  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 32, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 64]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,8876210
+    No: 10  GFLOPS: 6.57/154.18     result: MeasureResult(costs=(0.035245632,), error_no=MeasureErrorNo.NO_ERROR, all_cost=7.224449396133423, timestamp=1668556474.7894113) [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,1850606
+    No: 11  GFLOPS: 17.15/154.18    result: MeasureResult(costs=(0.013495552,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.2125194072723389, timestamp=1668556475.4980066)        [('tile_f', [-1, 2, 4, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2455000
+    No: 12  GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1280,9 +1252,11 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 4, 2]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 128]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1338992
-    No: 11  GFLOPS: 5.38/12.53      result: MeasureResult(costs=(0.04300253675,), error_no=MeasureErrorNo.NO_ERROR, all_cost=3.1535797119140625, timestamp=1668549263.3956833)      [('tile_f', [-1, 32, 1, 2]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2139120
-    No: 12  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 128, 2]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 2, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1696078
+    No: 13  GFLOPS: 174.33/174.33   result: MeasureResult(costs=(0.001327968394736842,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6654844284057617, timestamp=1668556479.535505)        [('tile_f', [-1, 1, 8, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,8171801
+    No: 14  GFLOPS: 3.08/174.33     result: MeasureResult(costs=(0.07524625374999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.3694047927856445, timestamp=1668556480.8263621)        [('tile_f', [-1, 128, 2, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 1, 4]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,1811937
+    No: 15  GFLOPS: 21.73/174.33    result: MeasureResult(costs=(0.0106537506,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.669083833694458, timestamp=1668556481.527801) [('tile_f', [-1, 8, 2, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 2, 16]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7286413
+    No: 16  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1404,8 +1378,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 1, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4981589
-    No: 13  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 8, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 32, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10313724
+    No: 17  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1527,9 +1501,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 8, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 32, 2]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10120209
-    No: 14  GFLOPS: 3.18/12.53      result: MeasureResult(costs=(0.072806816,), error_no=MeasureErrorNo.NO_ERROR, all_cost=9.520730257034302, timestamp=1668549273.1264057) [('tile_f', [-1, 32, 4, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9878560
-    No: 15  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 1, 16]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 4]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2991064
+    No: 18  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1651,9 +1624,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 64, 4, 1]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7912545
-    No: 16  GFLOPS: 18.76/18.76     result: MeasureResult(costs=(0.012339853888888887,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.854473352432251, timestamp=1668549273.8730075)        [('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4268553
-    No: 17  GFLOPS: 0.00/18.76      result: Traceback (most recent call last):
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 256, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2, 8]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4940373
+    No: 19  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 588, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 540, in _build_func_common
@@ -1775,10 +1747,8 @@ for this template
       File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 871, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
-    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 8, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 128, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,449582
-    No: 18  GFLOPS: 262.69/262.69   result: MeasureResult(costs=(0.0008812803908045978,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.1055493354797363, timestamp=1668549280.0131285)      [('tile_f', [-1, 2, 16, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,8983955
-    No: 19  GFLOPS: 70.02/262.69    result: MeasureResult(costs=(0.0033061094166666667,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.018292188644409, timestamp=1668549280.9508169)       [('tile_f', [-1, 1, 4, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16, 4]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,1824415
-    No: 20  GFLOPS: 82.93/262.69    result: MeasureResult(costs=(0.002791370583333333,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.6083526611328125, timestamp=1668549281.647997)        [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 2]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9717133
+    tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 16, 8, 2]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 128, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,5639343
+    No: 20  GFLOPS: 195.91/195.91   result: MeasureResult(costs=(0.0011816482499999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=3.031599521636963, timestamp=1668556484.7983866)       [('tile_f', [-1, 2, 4, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9973113
 
 
 
@@ -1833,9 +1803,9 @@ and measure running time.
     Finish loading 20 records
 
     Best config:
-    [('tile_f', [-1, 2, 16, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,8983955
+    [('tile_f', [-1, 2, 4, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 2, 8]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9973113
     Finish loading 20 records
-    Time cost of this operator: 0.001286
+    Time cost of this operator: 0.001614
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
index f59aebc8d6..c8cf859283 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
@@ -327,10 +327,10 @@ Timing the untuned program
     ########## Build without Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
     ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  309.4     98.723   (1, 2, 10, 10, 3)  2       1        [309.4]           
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.048     0.973    (1, 6, 10, 10)     1       1        [3.048]           
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.955     0.305    (1, 1, 10, 10, 3)  1       1        [0.955]           
-    Total_time                                    -                                             313.403   -        -                  -       -        -                 
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  312.3     98.715   (1, 2, 10, 10, 3)  2       1        [312.3]           
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.073     0.971    (1, 6, 10, 10)     1       1        [3.073]           
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.991     0.313    (1, 1, 10, 10, 3)  1       1        [0.991]           
+    Total_time                                    -                                             316.364   -        -                  -       -        -                 
 
 
 
@@ -394,10 +394,10 @@ Timing the tuned program
     ########## Build with Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)  
     ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  101.1     97.398   (1, 6, 10, 10, 1)  2       1        [101.1]           
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.745     1.681    (1, 6, 10, 10)     1       1        [1.745]           
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.956     0.921    (1, 1, 10, 10, 3)  1       1        [0.956]           
-    Total_time                                    -                                             103.801   -        -                  -       -        -                 
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  103.1     97.424   (1, 6, 10, 10, 1)  2       1        [103.1]           
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.773     1.675    (1, 6, 10, 10)     1       1        [1.773]           
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.953     0.901    (1, 1, 10, 10, 3)  1       1        [0.953]           
+    Total_time                                    -                                             105.826   -        -                  -       -        -                 
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_pytorch.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_pytorch.rst.txt
index ed52535219..11d340099e 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_pytorch.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_pytorch.rst.txt
@@ -109,7 +109,7 @@ download a cat image and preprocess it to use as the model input.
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torch/ao/quantization/utils.py:281: UserWarning: must run observer before calling calculate_qparams. Returning default values.
       "must run observer before calling calculate_qparams. " +
     Downloading: "https://download.pytorch.org/models/quantized/mobilenet_v2_qnnpack_37f702c5.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2_qnnpack_37f702c5.pth
-
      0%|          | 0.00/3.42M [00:00<?, ?B/s]
    100%|##########| 3.42M/3.42M [00:00<00:00, 73.3MB/s]
+
      0%|          | 0.00/3.42M [00:00<?, ?B/s]
    100%|##########| 3.42M/3.42M [00:00<00:00, 50.4MB/s]
     /workspace/python/tvm/relay/frontend/pytorch_utils.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
       return LooseVersion(torch_ver) > ver
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
@@ -314,7 +314,7 @@ Look up prediction top 1 index in 1000 class synset.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  5.538 seconds)
+   **Total running time of the script:** ( 1 minutes  1.655 seconds)
 
 
 .. _sphx_glr_download_how_to_work_with_microtvm_micro_pytorch.py:
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
index 0ed5624b42..2f93181f1f 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
@@ -225,7 +225,7 @@ take about **2 minutes** to download the Stanford Cars, while COCO 2017 validati
  .. code-block:: none
 
 
-    '/tmp/tmpjf6_zkcp/images/random'
+    '/tmp/tmp6_yzhb7f/images/random'
 
 
 
@@ -316,7 +316,7 @@ objects to other stuff? We can display some examples from our datasets using ``m
 
 
 .. image-sg:: /how_to/work_with_microtvm/images/sphx_glr_micro_train_001.png
-   :alt: [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0]
+   :alt: [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0]
    :srcset: /how_to/work_with_microtvm/images/sphx_glr_micro_train_001.png
    :class: sphx-glr-single-img
 
@@ -325,8 +325,8 @@ objects to other stuff? We can display some examples from our datasets using ``m
 
  .. code-block:: none
 
-    /tmp/tmpjf6_zkcp/images/target contains 8144 images
-    /tmp/tmpjf6_zkcp/images/random contains 5000 images
+    /tmp/tmp6_yzhb7f/images/target contains 8144 images
+    /tmp/tmp6_yzhb7f/images/random contains 5000 images
 
 
 
@@ -501,13 +501,13 @@ the time on our validation set).
  .. code-block:: none
 
     Epoch 1/3
-    328/328 - 47s - loss: 0.2144 - accuracy: 0.9250 - val_loss: 0.1710 - val_accuracy: 0.9430 - 47s/epoch - 144ms/step
+    328/328 - 47s - loss: 0.2093 - accuracy: 0.9259 - val_loss: 0.1729 - val_accuracy: 0.9362 - 47s/epoch - 143ms/step
     Epoch 2/3
-    328/328 - 44s - loss: 0.0949 - accuracy: 0.9634 - val_loss: 0.1365 - val_accuracy: 0.9517 - 44s/epoch - 133ms/step
+    328/328 - 44s - loss: 0.1036 - accuracy: 0.9642 - val_loss: 0.1694 - val_accuracy: 0.9434 - 44s/epoch - 133ms/step
     Epoch 3/3
-    328/328 - 44s - loss: 0.0676 - accuracy: 0.9737 - val_loss: 0.1010 - val_accuracy: 0.9679 - 44s/epoch - 133ms/step
+    328/328 - 43s - loss: 0.0639 - accuracy: 0.9754 - val_loss: 0.1391 - val_accuracy: 0.9558 - 43s/epoch - 132ms/step
 
-    <keras.callbacks.History object at 0x7feb7590e390>
+    <keras.callbacks.History object at 0x7f283a6516d0>
 
 
 
@@ -864,7 +864,7 @@ Arduino tutorial for how to do that `on GitHub <https://github.com/guberti/tvm-a
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 4 minutes  41.884 seconds)
+   **Total running time of the script:** ( 4 minutes  48.368 seconds)
 
 
 .. _sphx_glr_download_how_to_work_with_microtvm_micro_train.py:
diff --git a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
index 806b3bfbea..17c07e4229 100644
--- a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
@@ -5,18 +5,18 @@
 
 Computation times
 =================
-**06:51.106** total execution time for **how_to_work_with_microtvm** files:
+**06:50.430** total execution time for **how_to_work_with_microtvm** files:
 
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 04:41.884 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)               | 04:48.368 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_pytorch.py` (``micro_pytorch.py``)           | 01:05.538 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_pytorch.py` (``micro_pytorch.py``)           | 01:01.655 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:51.364 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)         | 00:48.472 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_aot.py` (``micro_aot.py``)                   | 00:08.446 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_aot.py` (``micro_aot.py``)                   | 00:08.237 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.871 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)             | 00:03.695 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_microtvm_micro_reference_vm.py` (``micro_reference_vm.py``) | 00:00.001 | 0.0 MB |
 +---------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
index e1a3659da1..092dbd6f5d 100644
--- a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
@@ -5,14 +5,14 @@
 
 Computation times
 =================
-**00:45.425** total execution time for **how_to_work_with_relay** files:
+**00:42.718** total execution time for **how_to_work_with_relay** files:
 
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_using_pipeline_executor.py` (``using_pipeline_executor.py``) | 00:33.298 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_using_pipeline_executor.py` (``using_pipeline_executor.py``) | 00:31.251 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)           | 00:10.352 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)           | 00:10.121 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                             | 00:01.769 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)                             | 00:01.340 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)                 | 00:00.007 | 0.0 MB |
 +----------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
index 6e834904e4..ae8767ef02 100644
--- a/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/intrin_math.rst.txt
@@ -261,7 +261,7 @@ The following example customizes CUDA lowering rule for :code:`exp`.
  .. code-block:: none
 
 
-    <function my_cuda_math_rule at 0x7feb725c9c20>
+    <function my_cuda_math_rule at 0x7f283a7c7b90>
 
 
 
diff --git a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
index 705c72a7c6..f13ce3cf6c 100644
--- a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
@@ -5,22 +5,22 @@
 
 Computation times
 =================
-**00:07.493** total execution time for **how_to_work_with_schedules** files:
+**00:07.140** total execution time for **how_to_work_with_schedules** files:
 
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:05.115 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)                 | 00:04.899 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:01.017 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)                     | 00:00.967 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.580 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)                     | 00:00.542 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.561 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)                               | 00:00.523 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.119 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)                     | 00:00.113 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``) | 00:00.052 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``) | 00:00.049 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.030 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)                               | 00:00.029 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.020 | 0.0 MB |
+| :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)               | 00:00.019 | 0.0 MB |
 +------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
index c9a991d90f..82305303eb 100644
--- a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
@@ -347,7 +347,7 @@ The importing needs to happen before the tensorized GEMV being executed.
                  C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
       buffer_map = {A_1: A, B_1: B, C_1: C}
       preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmpg0bbiniy/input0.cc'\nsource_filename = \"/tmp/tmpg0bbiniy/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
+      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmpnxfsmwfa/input0.cc'\nsource_filename = \"/tmp/tmpnxfsmwfa/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
       for (i, 0, 1024) {
         for (j.outer: int32, 0, 32) {
           @tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
index e29ebc154b..da1f72fcae 100644
--- a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:27.288** total execution time for **topic_vta_tutorials_autotvm** files:
+**00:25.907** total execution time for **topic_vta_tutorials_autotvm** files:
 
 +---------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:27.282 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``) | 00:25.900 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)     | 00:00.006 | 0.0 MB |
 +---------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
index 1d5c8b5794..68e1fa253a 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
@@ -289,7 +289,7 @@ The compilation steps are:
       DeprecationWarning,
     /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
       relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-    resnet18_v1 inference graph built in 30.41s!
+    resnet18_v1 inference graph built in 28.60s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
index 1f09a273f3..0e09c7f48c 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
@@ -333,7 +333,7 @@ The compilation steps are:
 
     /workspace/python/tvm/relay/build_module.py:348: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
       DeprecationWarning,
-    yolov3-tiny inference graph built in 20.30s!
+    yolov3-tiny inference graph built in 19.25s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
index 8dd19ef7a6..d0fe2505b5 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**01:42.979** total execution time for **topic_vta_tutorials_frontend** files:
+**01:39.922** total execution time for **topic_vta_tutorials_frontend** files:
 
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:52.646 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)           | 00:51.507 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:50.332 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``) | 00:48.415 | 0.0 MB |
 +------------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
index c67150ddae..31e0e6ffcb 100644
--- a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:03.160** total execution time for **topic_vta_tutorials_optimize** files:
+**00:03.120** total execution time for **topic_vta_tutorials_optimize** files:
 
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:02.707 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)         | 00:02.691 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.454 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``) | 00:00.429 | 0.0 MB |
 +--------------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
index 6b8a7e1c69..9fc84f2943 100644
--- a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:00.818** total execution time for **topic_vta_tutorials** files:
+**00:00.767** total execution time for **topic_vta_tutorials** files:
 
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.439 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``) | 00:00.416 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.379 | 0.0 MB |
+| :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``) | 00:00.351 | 0.0 MB |
 +---------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
index 69fffb438b..1c6a9e4ef7 100644
--- a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
@@ -203,6 +203,13 @@ trials, we can load the best schedule from the log file and apply it.
 
 
 
+.. rst-class:: sphx-glr-script-out
+
+ .. code-block:: none
+
+    *E
+
+
 
 
 
@@ -326,7 +333,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 94.425 ms
+    Execution time of this operator: 97.652 ms
 
 
 
@@ -426,7 +433,7 @@ resume the status and do more 5 trials.
     Resume search:
     /venv/apache-tvm-py3.7/lib/python3.7/site-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html
       warnings.warn(f'Old style callback is deprecated.  See: {link}', UserWarning)
-    .T
+    *E
 
 
 
@@ -444,7 +451,7 @@ operations.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  33.374 seconds)
+   **Total running time of the script:** ( 1 minutes  29.784 seconds)
 
 
 .. _sphx_glr_download_tutorial_auto_scheduler_matmul_x86.py:
diff --git a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
index afbf19791e..6016feb505 100644
--- a/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_matmul_x86.rst.txt
@@ -450,16 +450,16 @@ reduce variance, we take 5 measurements and average them.
     waiting for device...
     device available
     Get devices for measurement successfully!
-    No: 1   GFLOPS: 9.56/9.56       result: MeasureResult(costs=(0.028084570999999996,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6163735389709473, timestamp=1668547810.291327)        [('tile_y', [-1, 2]), ('tile_x', [-1, 64])],None,61
-    No: 2   GFLOPS: 12.75/12.75     result: MeasureResult(costs=(0.0210513222,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5504179000854492, timestamp=1668547810.842157)        [('tile_y', [-1, 8]), ('tile_x', [-1, 512])],None,93
-    No: 3   GFLOPS: 9.92/12.75      result: MeasureResult(costs=(0.0270647714,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.9314725399017334, timestamp=1668547812.2323968)       [('tile_y', [-1, 16]), ('tile_x', [-1, 128])],None,74
-    No: 4   GFLOPS: 1.95/12.75      result: MeasureResult(costs=(0.13787345239999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.365994453430176, timestamp=1668547815.3772676) [('tile_y', [-1, 8]), ('tile_x', [-1, 2])],None,13
-    No: 5   GFLOPS: 2.08/12.75      result: MeasureResult(costs=(0.1292589286,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.2243378162384033, timestamp=1668547817.7326822)       [('tile_y', [-1, 4]), ('tile_x', [-1, 4])],None,22
-    No: 6   GFLOPS: 2.00/12.75      result: MeasureResult(costs=(0.1344603278,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.305664539337158, timestamp=1668547820.0587761)        [('tile_y', [-1, 512]), ('tile_x', [-1, 8])],None,39
-    No: 7   GFLOPS: 2.47/12.75      result: MeasureResult(costs=(0.10846081300000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8945081233978271, timestamp=1668547822.7405117)        [('tile_y', [-1, 8]), ('tile_x', [-1, 4])],None,23
-    No: 8   GFLOPS: 8.35/12.75      result: MeasureResult(costs=(0.0321628392,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.7031145095825195, timestamp=1668547823.469019)        [('tile_y', [-1, 512]), ('tile_x', [-1, 32])],None,59
-    No: 9   GFLOPS: 6.73/12.75      result: MeasureResult(costs=(0.0398940858,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.7958028316497803, timestamp=1668547824.3780992)       [('tile_y', [-1, 512]), ('tile_x', [-1, 128])],None,79
-    No: 10  GFLOPS: 3.64/12.75      result: MeasureResult(costs=(0.07378304599999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.3074233531951904, timestamp=1668547825.7257895)        [('tile_y', [-1, 128]), ('tile_x', [-1, 16])],None,47
+    No: 1   GFLOPS: 13.76/13.76     result: MeasureResult(costs=(0.0195130834,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.4759232997894287, timestamp=1668555113.156249)        [('tile_y', [-1, 256]), ('tile_x', [-1, 64])],None,68
+    No: 2   GFLOPS: 11.50/13.76     result: MeasureResult(costs=(0.0233377094,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5467529296875, timestamp=1668555114.4190085)  [('tile_y', [-1, 128]), ('tile_x', [-1, 32])],None,57
+    No: 3   GFLOPS: 2.42/13.76      result: MeasureResult(costs=(0.11075313839999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.9274091720581055, timestamp=1668555117.0882974)        [('tile_y', [-1, 2]), ('tile_x', [-1, 4])],None,21
+    No: 4   GFLOPS: 9.09/13.76      result: MeasureResult(costs=(0.0295406538,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6850738525390625, timestamp=1668555118.4477098)       [('tile_y', [-1, 16]), ('tile_x', [-1, 32])],None,54
+    No: 5   GFLOPS: 4.17/13.76      result: MeasureResult(costs=(0.0644424472,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.2029130458831787, timestamp=1668555119.8010767)       [('tile_y', [-1, 16]), ('tile_x', [-1, 16])],None,44
+    No: 6   GFLOPS: 10.46/13.76     result: MeasureResult(costs=(0.025660678599999997,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.615302324295044, timestamp=1668555120.375982) [('tile_y', [-1, 8]), ('tile_x', [-1, 64])],None,63
+    No: 7   GFLOPS: 12.69/13.76     result: MeasureResult(costs=(0.021158569,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5504105091094971, timestamp=1668555120.923032) [('tile_y', [-1, 32]), ('tile_x', [-1, 128])],None,75
+    No: 8   GFLOPS: 10.14/13.76     result: MeasureResult(costs=(0.0264689112,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.621466875076294, timestamp=1668555121.5158722)        [('tile_y', [-1, 512]), ('tile_x', [-1, 128])],None,79
+    No: 9   GFLOPS: 12.21/13.76     result: MeasureResult(costs=(0.021977817599999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.46729397773742676, timestamp=1668555122.4011207)      [('tile_y', [-1, 2]), ('tile_x', [-1, 512])],None,91
+    No: 10  GFLOPS: 14.67/14.67     result: MeasureResult(costs=(0.018297431,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.746845006942749, timestamp=1668555122.9000688) [('tile_y', [-1, 64]), ('tile_x', [-1, 64])],None,66
 
 
 
diff --git a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
index 3996242eb8..a852d7f5e4 100644
--- a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
@@ -320,7 +320,7 @@ standard deviation.
 
  .. code-block:: none
 
-    {'mean': 519.9615746099993, 'median': 520.8512474500026, 'std': 2.491058036681814}
+    {'mean': 519.9631207500033, 'median': 519.4895130000077, 'std': 3.823469731553201}
 
 
 
@@ -554,31 +554,32 @@ the tuning data to.
 
  .. code-block:: none
 
-
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:    9.79/  22.52 GFLOPS | Progress: (4/20) | 7.90 s
    [Task  1/25]  Current/Best:   11.15/  22.52 GFLOPS | Progress: (8/20) | 11.13 s
    [Task  1/25]  Current/Best:    9.07/  22.52 GFLOPS | Progress: (12/20) | 14.34 s
    [Task  1/25]  Current/Best:   17.57/  22.52 GFLOPS | Progress: (16/20) | 16.29 s
    [Task  1/25]  Current/Best:   12.88/  22.52 GFLOPS | Progress: (20/20) | 21.99 s Done.
-
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   14.85/  18.91 GFLOPS | Progress: (4/20) | 3.07 s
    [Task  2/25]  Current/Best:   16.42/  18.91 GFLOPS | Progress: (8/20) | 4.24 s
    [Task  2/25]  Current/Best:   11.18/  19.88 GFLOPS | Progress: (12/20) | 6.42 s
    [Task  2/25]  Current/Best:    2.10/  19.88 GFLOPS | Progress: (16/20) | 7.70 s
    [Task  2/25]  Current/Best:    6.23/  19.88 GFLOPS | Progress: (20/20) | 9.01 s Done.
-
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    6.39/  19.47 GFLOPS | Progress: (4/20) | 3.42 s
    [Task  3/25]  Current/Best:   14.31/  19.47 GFLOPS | Progress: (8/20) | 5.62 s
    [Task  3/25]  Current/Best:   15.20/  23.65 GFLOPS | Progress: (12/20) | 7.64 s
    [Task  3/25]  Current/Best:   10.28/  23.65 GFLOPS | Progress: (16/20) | 9.72 s
    [Task  3/25]  Current/Best:   17.80/  23.65 GFLOPS | Progress: (20/20) | 12.05 s Done.
-
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:   12.03/  12.03 GFLOPS | Progress: (4/20) | 6.69 s
    [Task  4/25]  Current/Best:   10.61/  20.12 GFLOPS | Progress: (8/20) | 11.00 s
    [Task  4/25]  Current/Best:   11.30/  20.12 GFLOPS | Progress: (12/20) | 13.39 s
    [Task  4/25]  Current/Best:    6.16/  20.12 GFLOPS | Progress: (16/20) | 15.36 s
    [Task  4/25]  Current/Best:    2.94/  22.31 GFLOPS | Progress: (20/20) | 17.44 s Done.
-
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    8.69/  13.85 GFLOPS | Progress: (4/20) | 4.75 s
    [Task  5/25]  Current/Best:   17.55/  19.35 GFLOPS | Progress: (8/20) | 6.40 s
    [Task  5/25]  Current/Best:   14.33/  19.35 GFLOPS | Progress: (12/20) | 8.80 s
    [Task  5/25]  Current/Best:   12.94/  19.35 GFLOPS | Progress: (16/20) | 10.94 s
    [Task  5/25]  Current/Best:   17.42/  19.35 GFLOPS | Progress: (20/20) | 12.75 s Done.
-
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.05/  21.19 GFLOPS | Progress: (4/20) | 5.53 s
    [Task  6/25]  Current/Best:   19.68/  21.19 GFLOPS | Progress: (8/20) | 8.14 s
    [Task  6/25]  Current/Best:   17.39/  21.19 GFLOPS | Progress: (12/20) | 9.76 s
    [Task  6/25]  Current/Best:   18.35/  21.19 GFLOPS | Progress: (16/20) | 12.29 s
    [Task  6/25]  Current/Best:   10.49/  21.19 GFLOPS | Progress: (20/20) | 15.77 s Done.
-
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   16.12/  16.12 GFLOPS | Progress: (4/20) | 4.14 s
    [Task  7/25]  Current/Best:   18.22/  18.51 GFLOPS | Progress: (8/20) | 5.79 s
    [Task  7/25]  Current/Best:   12.01/  18.51 GFLOPS | Progress: (12/20) | 8.79 s
    [Task  7/25]  Current/Best:    5.22/  18.51 GFLOPS | Progress: (16/20) | 11.32 s
    [Task  7/25]  Current/Best:   16.70/  21.32 GFLOPS | Progress: (20/20) | 13.00 s Done.
-
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:    6.73/  14.93 GFLOPS | Progress: (4/20) | 3.58 s
    [Task  8/25]  Current/Best:   12.08/  14.93 GFLOPS | Progress: (8/20) | 7.23 s
    [Task  8/25]  Current/Best:   14.05/  14.93 GFLOPS | Progress: (12/20) | 9.22 s
    [Task  8/25]  Current/Best:   11.30/  14.93 GFLOPS | Progress: (16/20) | 12.32 s
    [Task  8/25]  Current/Best:    7.09/  14.93 GFLOPS | Progress: (20/20) | 15.40 s Done.
-
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   16.59/  16.59 GFLOPS | Progress: (4/20) | 2.87 s
    [Task  9/25]  Current/Best:    7.68/  16.59 GFLOPS | Progress: (8/20) | 6.11 s
    [Task  9/25]  Current/Best:    5.14/  17.82 GFLOPS | Progress: (12/20) | 11.52 s
    [Task  9/25]  Current/Best:   13.77/  17.82 GFLOPS | Progress: (16/20) | 13.04 s
    [Task  9/25]  Current/Best:   17.43/  17.82 GFLOPS | Progress: (20/20) | 15.78 s Done.
-
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   15.02/  15.02 GFLOPS | Progress: (4/20) | 5.51 s
    [Task 10/25]  Current/Best:   14.34/  15.02 GFLOPS | Progress: (8/20) | 7.37 s
    [Task 10/25]  Current/Best:   16.98/  20.74 GFLOPS | Progress: (12/20) | 9.77 s
    [Task 10/25]  Current/Best:   18.55/  20.74 GFLOPS | Progress: (16/20) | 11.34 s
    [Task 10/25]  Current/Best:   18.82/  20.74 GFLOPS | Progress: (20/20) | 12.74 s Done.
-
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:    9.37/  21.18 GFLOPS | Progress: (4/20) | 3.30 s
    [Task 11/25]  Current/Best:   10.46/  21.18 GFLOPS | Progress: (8/20) | 6.26 s
    [Task 11/25]  Current/Best:   16.02/  23.84 GFLOPS | Progress: (12/20) | 8.28 s
    [Task 11/25]  Current/Best:   16.43/  23.84 GFLOPS | Progress: (16/20) | 10.55 s
    [Task 11/25]  Current/Best:    9.43/  23.84 GFLOPS | Progress: (20/20) | 12.86 s Done.
-
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    8.36/  15.79 GFLOPS | Progress: (4/20) | 3.76 s
    [Task 12/25]  Current/Best:   11.30/  15.79 GFLOPS | Progress: (8/20) | 6.79 s
    [Task 12/25]  Current/Best:   11.25/  20.28 GFLOPS | Progress: (12/20) | 9.26 s
    [Task 12/25]  Current/Best:   17.29/  20.28 GFLOPS | Progress: (16/20) | 11.26 s
    [Task 12/25]  Current/Best:    4.49/  20.28 GFLOPS | Progress: (20/20) | 15.05 s Done.
-
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:   11.72/  21.57 GFLOPS | Progress: (4/20) | 4.77 s
    [Task 13/25]  Current/Best:   11.82/  22.16 GFLOPS | Progress: (8/20) | 6.59 s
    [Task 13/25]  Current/Best:   12.33/  22.16 GFLOPS | Progress: (12/20) | 10.64 s
    [Task 13/25]  Current/Best:   21.08/  22.16 GFLOPS | Progress: (16/20) | 12.71 s
    [Task 13/25]  Current/Best:    9.97/  22.16 GFLOPS | Progress: (20/20) | 15.49 s Done.
-
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   16.69/  16.69 GFLOPS | Progress: (4/20) | 4.66 s
    [Task 14/25]  Current/Best:   10.59/  16.69 GFLOPS | Progress: (8/20) | 7.12 s
    [Task 14/25]  Current/Best:    6.40/  16.69 GFLOPS | Progress: (12/20) | 13.47 s
    [Task 14/25]  Current/Best:    4.00/  19.18 GFLOPS | Progress: (16/20) | 15.81 s
    [Task 14/25]  Current/Best:   13.48/  21.64 GFLOPS | Progress: (20/20) | 18.76 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   18.94/  18.94 GFLOPS | Progress: (4/20) | 3.35 s
    [Task 15/25]  Current/Best:   16.01/  18.94 GFLOPS | Progress: (8/20) | 5.58 s
    [Task 15/25]  Current/Best:    9.18/  18.94 GFLOPS | Progress: (12/20) | 9.13 s Done.
-
    [Task 15/25]  Current/Best:   21.86/  21.86 GFLOPS | Progress: (16/20) | 11.11 s
    [Task 15/25]  Current/Best:   15.42/  21.86 GFLOPS | Progress: (20/20) | 12.60 s Done.
-
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   10.56/  18.46 GFLOPS | Progress: (4/20) | 4.23 s
    [Task 16/25]  Current/Best:    3.07/  18.46 GFLOPS | Progress: (8/20) | 6.09 s
    [Task 16/25]  Current/Best:   18.66/  18.66 GFLOPS | Progress: (12/20) | 8.30 s
    [Task 16/25]  Current/Best:    3.03/  20.00 GFLOPS | Progress: (16/20) | 11.28 s
    [Task 16/25]  Current/Best:   10.59/  20.00 GFLOPS | Progress: (20/20) | 14.23 s Done.
-
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   10.50/  17.21 GFLOPS | Progress: (4/20) | 4.00 s
    [Task 17/25]  Current/Best:   11.94/  17.31 GFLOPS | Progress: (8/20) | 6.20 s
    [Task 17/25]  Current/Best:   10.69/  18.69 GFLOPS | Progress: (12/20) | 8.66 s
    [Task 17/25]  Current/Best:   12.14/  18.69 GFLOPS | Progress: (16/20) | 14.50 s
    [Task 17/25]  Current/Best:   11.74/  21.32 GFLOPS | Progress: (20/20) | 16.58 s Done.
-
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:    2.99/  16.91 GFLOPS | Progress: (4/20) | 4.08 s
    [Task 18/25]  Current/Best:   11.11/  19.32 GFLOPS | Progress: (8/20) | 7.06 s
    [Task 18/25]  Current/Best:   17.72/  19.32 GFLOPS | Progress: (12/20) | 9.28 s
    [Task 18/25]  Current/Best:    6.24/  19.32 GFLOPS | Progress: (16/20) | 12.39 s
    [Task 18/25]  Current/Best:    6.16/  19.32 GFLOPS | Progress: (20/20) | 15.26 s Done.
-
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:   19.86/  19.86 GFLOPS | Progress: (4/20) | 4.57 s
    [Task 19/25]  Current/Best:   12.24/  19.86 GFLOPS | Progress: (8/20) | 8.43 s
    [Task 19/25]  Current/Best:   20.60/  20.60 GFLOPS | Progress: (12/20) | 11.37 s
    [Task 19/25]  Current/Best:   18.12/  20.60 GFLOPS | Progress: (16/20) | 14.31 s
    [Task 19/25]  Current/Best:   14.09/  20.60 GFLOPS | Progress: (20/20) | 17.88 s Done.
-
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:   20.99/  20.99 GFLOPS | Progress: (4/20) | 4.44 s
    [Task 20/25]  Current/Best:   13.83/  20.99 GFLOPS | Progress: (8/20) | 6.18 s
    [Task 20/25]  Current/Best:   16.74/  20.99 GFLOPS | Progress: (12/20) | 8.31 s
    [Task 20/25]  Current/Best:   16.87/  20.99 GFLOPS | Progress: (16/20) | 10.89 s
    [Task 20/25]  Current/Best:   13.87/  20.99 GFLOPS | Progress: (20/20) | 16.77 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:   10.46/  13.00 GFLOPS | Progress: (4/20) | 3.46 s
    [Task 21/25]  Current/Best:   17.41/  17.41 GFLOPS | Progress: (8/20) | 5.99 s Done.
-
    [Task 21/25]  Current/Best:   17.35/  17.41 GFLOPS | Progress: (12/20) | 7.85 s
    [Task 21/25]  Current/Best:    8.34/  21.60 GFLOPS | Progress: (16/20) | 10.65 s
    [Task 21/25]  Current/Best:   10.56/  21.60 GFLOPS | Progress: (20/20) | 12.41 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:   13.18/  13.18 GFLOPS | Progress: (4/20) | 4.36 s
    [Task 22/25]  Current/Best:   17.35/  17.35 GFLOPS | Progress: (8/20) | 5.92 s
    [Task 22/25]  Current/Best:    8.05/  17.35 GFLOPS | Progress: (12/20) | 8.86 s
    [Task 22/25]  Current/Best:    5.21/  17.35 GFLOPS | Progress: (16/20) | 10.92 s
    [Task 22/25]  Current/Best:    5.28/  20.03 GFLOPS | Progress: (20/20) | 12.46 s Done.
-
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   12.01/  15.04 GFLOPS | Progress: (4/20) | 5.53 s
    [Task 23/25]  Current/Best:   18.57/  20.58 GFLOPS | Progress: (8/20) | 7.88 s
    [Task 23/25]  Current/Best:   12.00/  20.58 GFLOPS | Progress: (12/20) | 12.52 s
    [Task 23/25]  Current/Best:    1.55/  20.58 GFLOPS | Progress: (16/20) | 17.71 s
    [Task 23/25]  Current/Best:   10.10/  20.58 GFLOPS | Progress: (20/20) | 21.59 s Done.
-
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    3.78/   3.78 GFLOPS | Progress: (4/20) | 12.11 s
    [Task 24/25]  Current/Best:    1.66/   3.78 GFLOPS | Progress: (8/20) | 22.65 s
    [Task 24/25]  Current/Best:    1.47/  10.01 GFLOPS | Progress: (12/20) | 30.11 s
    [Task 24/25]  Current/Best:    3.14/  10.01 GFLOPS | Progress: (16/20) | 40.80 s
    [Task 24/25]  Current/Best:    3.44/  10.01 GFLOPS | Progress: (20/20) | 52.52 s
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
+
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:    6.30/  16.02 GFLOPS | Progress: (4/20) | 8.76 s
    [Task  1/25]  Current/Best:    5.58/  18.29 GFLOPS | Progress: (8/20) | 12.27 s
    [Task  1/25]  Current/Best:   12.76/  18.29 GFLOPS | Progress: (12/20) | 16.61 s
    [Task  1/25]  Current/Best:   11.32/  18.29 GFLOPS | Progress: (16/20) | 19.31 s
    [Task  1/25]  Current/Best:   16.04/  18.29 GFLOPS | Progress: (20/20) | 21.35 s Done.
+
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.10/  13.19 GFLOPS | Progress: (4/20) | 3.01 s
    [Task  2/25]  Current/Best:   14.59/  16.67 GFLOPS | Progress: (8/20) | 4.28 s
    [Task  2/25]  Current/Best:   15.14/  16.67 GFLOPS | Progress: (12/20) | 5.64 s
    [Task  2/25]  Current/Best:   20.51/  20.51 GFLOPS | Progress: (16/20) | 6.76 s
    [Task  2/25]  Current/Best:   10.88/  20.51 GFLOPS | Progress: (20/20) | 8.38 s Done.
+
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    8.24/  16.36 GFLOPS | Progress: (4/20) | 3.57 s
    [Task  3/25]  Current/Best:   17.74/  17.74 GFLOPS | Progress: (8/20) | 5.88 s
    [Task  3/25]  Current/Best:   12.29/  18.40 GFLOPS | Progress: (12/20) | 7.82 s
    [Task  3/25]  Current/Best:    9.81/  18.83 GFLOPS | Progress: (16/20) | 9.47 s
    [Task  3/25]  Current/Best:    7.45/  18.83 GFLOPS | Progress: (20/20) | 13.35 s Done.
+
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:   10.97/  16.77 GFLOPS | Progress: (4/20) | 3.70 s
    [Task  4/25]  Current/Best:    8.30/  16.77 GFLOPS | Progress: (8/20) | 8.81 s
    [Task  4/25]  Current/Best:   18.89/  20.29 GFLOPS | Progress: (12/20) | 10.24 s
    [Task  4/25]  Current/Best:    9.57/  20.29 GFLOPS | Progress: (16/20) | 13.28 s
    [Task  4/25]  Current/Best:   11.02/  20.29 GFLOPS | Progress: (20/20) | 18.27 s Done.
+
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    6.45/  19.24 GFLOPS | Progress: (4/20) | 3.41 s
    [Task  5/25]  Current/Best:    4.71/  19.24 GFLOPS | Progress: (8/20) | 5.56 s
    [Task  5/25]  Current/Best:    8.27/  19.24 GFLOPS | Progress: (12/20) | 7.43 s
    [Task  5/25]  Current/Best:   17.30/  19.24 GFLOPS | Progress: (16/20) | 8.82 s
    [Task  5/25]  Current/Best:    8.43/  19.24 GFLOPS | Progress: (20/20) | 10.50 s Done.
+
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   11.19/  18.44 GFLOPS | Progress: (4/20) | 4.45 s
    [Task  6/25]  Current/Best:   17.87/  18.44 GFLOPS | Progress: (8/20) | 6.87 s
    [Task  6/25]  Current/Best:   14.84/  18.44 GFLOPS | Progress: (12/20) | 9.45 s
    [Task  6/25]  Current/Best:   12.30/  18.44 GFLOPS | Progress: (16/20) | 11.96 s
    [Task  6/25]  Current/Best:   18.04/  18.44 GFLOPS | Progress: (20/20) | 14.25 s Done.
+
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:    8.45/  20.43 GFLOPS | Progress: (4/20) | 3.56 s
    [Task  7/25]  Current/Best:   19.54/  20.43 GFLOPS | Progress: (8/20) | 5.53 s
    [Task  7/25]  Current/Best:    7.12/  20.43 GFLOPS | Progress: (12/20) | 9.18 s
    [Task  7/25]  Current/Best:    4.79/  20.43 GFLOPS | Progress: (16/20) | 11.50 s
    [Task  7/25]  Current/Best:   18.26/  20.43 GFLOPS | Progress: (20/20) | 13.22 s Done.
+
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:   10.86/  21.23 GFLOPS | Progress: (4/20) | 4.72 s
    [Task  8/25]  Current/Best:   12.20/  21.23 GFLOPS | Progress: (8/20) | 7.58 s
    [Task  8/25]  Current/Best:   16.26/  21.23 GFLOPS | Progress: (12/20) | 12.29 s
    [Task  8/25]  Current/Best:    8.91/  21.23 GFLOPS | Progress: (16/20) | 23.46 s
    [Task  8/25]  Current/Best:   15.06/  21.23 GFLOPS | Progress: (20/20) | 30.96 s
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   22.97/  22.97 GFLOPS | Progress: (4/20) | 3.24 s
    [Task  9/25]  Current/Best:    6.35/  22.97 GFLOPS | Progress: (8/20) | 14.21 s
    [Task  9/25]  Current/Best:    9.57/  22.97 GFLOPS | Progress: (12/20) | 16.84 s
    [Task  9/25]  Current/Best:   19.72/  22.97 GFLOPS | Progress: (16/20) | 19.52 s
    [Task  9/25]  Current/Best:   15.49/  22.97 GFLOPS | Progress: (20/2
 0) | 27.82 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:    1.61/  18.26 GFLOPS | Progress: (4/20) | 3.81 s
    [Task 10/25]  Current/Best:    5.08/  18.26 GFLOPS | Progress: (8/20) | 5.98 s
    [Task 10/25]  Current/Best:   13.31/  18.26 GFLOPS | Progress: (12/20) | 8.21 s
    [Task 10/25]  Current/Best:   13.18/  20.96 GFLOPS | Progress: (16/20) | 9.81 s
    [Task 10/25]  Current/Best:   18.23/  20.96 GFLOPS | Progress: (20/20) | 12.13 s Done.
+
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:    8.51/  18.10 GFLOPS | Progress: (4/20) | 3.62 s
    [Task 11/25]  Current/Best:   20.09/  20.09 GFLOPS | Progress: (8/20) | 5.75 s
    [Task 11/25]  Current/Best:    3.13/  21.09 GFLOPS | Progress: (12/20) | 8.15 s
    [Task 11/25]  Current/Best:   10.01/  23.89 GFLOPS | Progress: (16/20) | 10.09 s
    [Task 11/25]  Current/Best:    6.50/  23.89 GFLOPS | Progress: (20/20) | 13.01 s Done.
+
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    9.64/  10.75 GFLOPS | Progress: (4/20) | 6.72 s
    [Task 12/25]  Current/Best:   14.44/  21.04 GFLOPS | Progress: (8/20) | 8.68 s
    [Task 12/25]  Current/Best:   14.24/  21.04 GFLOPS | Progress: (12/20) | 12.11 s
    [Task 12/25]  Current/Best:    7.33/  21.04 GFLOPS | Progress: (16/20) | 14.05 s Done.
      Done.
-
    [Task 25/25]  Current/Best:    3.49/   7.44 GFLOPS | Progress: (4/20) | 2.96 s
    [Task 25/25]  Current/Best:    8.99/   8.99 GFLOPS | Progress: (8/20) | 13.66 s
    [Task 25/25]  Current/Best:    6.05/   8.99 GFLOPS | Progress: (12/20) | 25.32 s
    [Task 25/25]  Current/Best:    9.04/   9.04 GFLOPS | Progress: (16/20) | 30.87 s
    [Task 25/25]  Current/Best:    4.74/   9.05 GFLOPS | Progress: (20/20) | 32.12 s
+
    [Task 12/25]  Current/Best:    4.40/  21.04 GFLOPS | Progress: (20/20) | 17.28 s Done.
+
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    9.92/  17.36 GFLOPS | Progress: (4/20) | 3.89 s
    [Task 13/25]  Current/Best:   18.24/  18.24 GFLOPS | Progress: (8/20) | 7.12 s
    [Task 13/25]  Current/Best:   18.56/  18.56 GFLOPS | Progress: (12/20) | 9.73 s
    [Task 13/25]  Current/Best:    9.15/  18.58 GFLOPS | Progress: (16/20) | 12.70 s
    [Task 13/25]  Current/Best:    5.75/  18.58 GFLOPS | Progress: (20/20) | 16.49 s Done.
+
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   16.18/  17.99 GFLOPS | Progress: (4/20) | 3.63 s
    [Task 14/25]  Current/Best:    3.73/  17.99 GFLOPS | Progress: (8/20) | 5.71 s
    [Task 14/25]  Current/Best:   17.32/  17.99 GFLOPS | Progress: (12/20) | 8.23 s
    [Task 14/25]  Current/Best:    3.02/  17.99 GFLOPS | Progress: (16/20) | 11.39 s
    [Task 14/25]  Current/Best:    8.99/  21.47 GFLOPS | Progress: (20/20) | 13.05 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   10.33/  10.33 GFLOPS | Progress: (4/20) | 7.04 s
    [Task 15/25]  Current/Best:   16.52/  17.57 GFLOPS | Progress: (8/20) | 9.75 s
    [Task 15/25]  Current/Best:   17.24/  18.01 GFLOPS | Progress: (12/20) | 11.01 s
    [Task 15/25]  Current/Best:    8.38/  18.01 GFLOPS | Progress: (16/20) | 12.41 s
    [Task 15/25]  Current/Best:   20.18/  20.18 GFLOPS | Progress: (20/20)
  | 15.15 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   14.08/  16.10 GFLOPS | Progress: (4/20) | 3.52 s
    [Task 16/25]  Current/Best:    7.51/  18.33 GFLOPS | Progress: (8/20) | 6.46 s
    [Task 16/25]  Current/Best:   13.62/  18.33 GFLOPS | Progress: (12/20) | 7.86 s
    [Task 16/25]  Current/Best:   11.88/  18.33 GFLOPS | Progress: (16/20) | 9.68 s
    [Task 16/25]  Current/Best:   19.83/  19.83 GFLOPS | Progress: (20/20) | 10.95 s Done.
+
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
+     Done.
+
    [Task 17/25]  Current/Best:   23.90/  23.90 GFLOPS | Progress: (4/20) | 3.59 s
    [Task 17/25]  Current/Best:    3.06/  23.90 GFLOPS | Progress: (8/20) | 6.48 s
    [Task 17/25]  Current/Best:   18.87/  23.90 GFLOPS | Progress: (12/20) | 9.29 s
    [Task 17/25]  Current/Best:   11.24/  23.90 GFLOPS | Progress: (16/20) | 12.20 s
    [Task 17/25]  Current/Best:   13.64/  23.90 GFLOPS | Progress: (20/20) | 15.10 s Done.
+
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:    9.20/  19.70 GFLOPS | Progress: (4/20) | 3.29 s
    [Task 18/25]  Current/Best:   15.73/  22.18 GFLOPS | Progress: (8/20) | 4.67 s
    [Task 18/25]  Current/Best:   18.75/  22.18 GFLOPS | Progress: (12/20) | 6.22 s
    [Task 18/25]  Current/Best:   13.73/  22.18 GFLOPS | Progress: (16/20) | 12.25 s
    [Task 18/25]  Current/Best:   12.95/  22.18 GFLOPS | Progress: (20/20) | 14.55 s Done.
+
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:   11.73/  21.49 GFLOPS | Progress: (4/20) | 5.70 s
    [Task 19/25]  Current/Best:    5.38/  22.19 GFLOPS | Progress: (8/20) | 9.07 s
    [Task 19/25]  Current/Best:   11.66/  22.19 GFLOPS | Progress: (12/20) | 11.82 s
    [Task 19/25]  Current/Best:   11.29/  22.19 GFLOPS | Progress: (16/20) | 14.52 s
    [Task 19/25]  Current/Best:   22.44/  22.44 GFLOPS | Progress: (20/20) | 16.45 s Done.
+
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:   18.37/  18.37 GFLOPS | Progress: (4/20) | 3.69 s
    [Task 20/25]  Current/Best:   14.28/  21.65 GFLOPS | Progress: (8/20) | 6.74 s
    [Task 20/25]  Current/Best:   10.12/  21.65 GFLOPS | Progress: (12/20) | 8.44 s
    [Task 20/25]  Current/Best:   10.23/  21.65 GFLOPS | Progress: (16/20) | 10.63 s
    [Task 20/25]  Current/Best:   10.34/  21.65 GFLOPS | Progress: (20/20) | 13.57 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:   17.50/  17.50 GFLOPS | Progress: (4/20) | 3.64 s
    [Task 21/25]  Current/Best:    6.70/  17.50 GFLOPS | Progress: (8/20) | 6.78 s
    [Task 21/25]  Current/Best:    2.70/  17.50 GFLOPS | Progress: (12/20) | 11.39 s Done.
+
    [Task 21/25]  Current/Best:   14.55/  18.53 GFLOPS | Progress: (16/20) | 13.26 s
    [Task 21/25]  Current/Best:   21.30/  21.30 GFLOPS | Progress: (20/20) | 14.50 s Done.
+
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:   12.14/  17.90 GFLOPS | Progress: (4/20) | 3.35 s
    [Task 22/25]  Current/Best:    9.73/  22.40 GFLOPS | Progress: (8/20) | 5.40 s
    [Task 22/25]  Current/Best:   17.49/  22.40 GFLOPS | Progress: (12/20) | 7.37 s
    [Task 22/25]  Current/Best:    8.50/  22.40 GFLOPS | Progress: (16/20) | 9.17 s
    [Task 22/25]  Current/Best:    7.64/  22.40 GFLOPS | Progress: (20/20) | 10.78 s Done.
+
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   11.90/  20.14 GFLOPS | Progress: (4/20) | 3.74 s
    [Task 23/25]  Current/Best:   10.70/  20.14 GFLOPS | Progress: (8/20) | 7.00 s
    [Task 23/25]  Current/Best:    9.68/  20.14 GFLOPS | Progress: (12/20) | 9.71 s
    [Task 23/25]  Current/Best:   19.18/  22.44 GFLOPS | Progress: (16/20) | 12.16 s
    [Task 23/25]  Current/Best:   20.12/  22.44 GFLOPS | Progress: (20/20) | 15.14 s Done.
+
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    5.42/  10.21 GFLOPS | Progress: (4/20) | 4.24 s
    [Task 24/25]  Current/Best:    4.85/  10.21 GFLOPS | Progress: (8/20) | 10.23 s
    [Task 24/25]  Current/Best:   10.05/  10.21 GFLOPS | Progress: (12/20) | 17.83 s
    [Task 24/25]  Current/Best:    3.91/  10.21 GFLOPS | Progress: (16/20) | 28.54 s
    [Task 24/25]  Current/Best:    3.39/  10.21 GFLOPS | Progress: (20/20) | 39.26 s
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    5.69/   7.08 GFLOPS | Progress: (4/20) | 12.95 s Done.
+
    [Task 25/25]  Current/Best:    5.72/   8.59 GFLOPS | Progress: (8/20) | 14.69 s
    [Task 25/25]  Current/Best:    1.55/   8.59 GFLOPS | Progress: (12/20) | 19.83 s
    [Task 25/25]  Current/Best:    5.77/   8.75 GFLOPS | Progress: (16/20) | 24.88 s
    [Task 25/25]  Current/Best:    3.91/   8.75 GFLOPS | Progress: (20/20) | 30.22 s Done.
+
 
 
 
@@ -674,8 +675,8 @@ Verify that the optimized model runs and produces the same results:
 
  .. code-block:: none
 
-    class='n02123045 tabby, tabby cat' with probability=0.621102
-    class='n02123159 tiger cat' with probability=0.356379
+    class='n02123045 tabby, tabby cat' with probability=0.621104
+    class='n02123159 tiger cat' with probability=0.356378
     class='n02124075 Egyptian cat' with probability=0.019712
     class='n02129604 tiger, Panthera tigris' with probability=0.001215
     class='n04040759 radiator' with probability=0.000262
@@ -732,8 +733,8 @@ improvement in comparing the optimized model to the unoptimized model.
 
  .. code-block:: none
 
-    optimized: {'mean': 417.65824050000447, 'median': 418.0077908000044, 'std': 4.044211110186569}
-    unoptimized: {'mean': 519.9615746099993, 'median': 520.8512474500026, 'std': 2.491058036681814}
+    optimized: {'mean': 410.64341234000494, 'median': 411.674945499999, 'std': 2.992942377835809}
+    unoptimized: {'mean': 519.9631207500033, 'median': 519.4895130000077, 'std': 3.823469731553201}
 
 
 
@@ -756,7 +757,7 @@ profiling/benchmarking.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 10 minutes  45.363 seconds)
+   **Total running time of the script:** ( 10 minutes  32.397 seconds)
 
 
 .. _sphx_glr_download_tutorial_autotvm_relay_x86.py:
diff --git a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
index 7d7e32aa6a..2768fa0e1f 100644
--- a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
@@ -270,7 +270,7 @@ device and returns the measured cost. Network overhead is excluded.
 
  .. code-block:: none
 
-    1.261e-07 secs/op
+    1.262e-07 secs/op
 
 
 
diff --git a/docs/_sources/tutorial/intro_topi.rst.txt b/docs/_sources/tutorial/intro_topi.rst.txt
index fb7e3c9d1e..fe3878867d 100644
--- a/docs/_sources/tutorial/intro_topi.rst.txt
+++ b/docs/_sources/tutorial/intro_topi.rst.txt
@@ -263,7 +263,7 @@ As you can see, scheduled stages of computation have been accumulated and we can
 
  .. code-block:: none
 
-    [stage(a, placeholder(a, 0xa50fa70)), stage(b, placeholder(b, 0x21fc3170)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
+    [stage(a, placeholder(a, 0x3751670)), stage(b, placeholder(b, 0x18707210)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min [...]
 
 
 
diff --git a/docs/_sources/tutorial/sg_execution_times.rst.txt b/docs/_sources/tutorial/sg_execution_times.rst.txt
index 56307f1fce..06ba6b2c80 100644
--- a/docs/_sources/tutorial/sg_execution_times.rst.txt
+++ b/docs/_sources/tutorial/sg_execution_times.rst.txt
@@ -5,32 +5,32 @@
 
 Computation times
 =================
-**14:18.064** total execution time for **tutorial** files:
+**13:53.607** total execution time for **tutorial** files:
 
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:45.363 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)                 | 10:32.397 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 01:33.374 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``) | 01:29.784 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 00:59.992 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)     | 00:59.972 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:36.726 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)                 | 00:35.579 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:20.427 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)               | 00:14.334 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:01.221 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.758 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)                               | 00:00.776 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)       | 00:00.628 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.177 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``) | 00:00.148 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)                           | 00:00.005 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_uma.py` (``uma.py``)                                             | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)   | 00:00.001 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)                             | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
 | :ref:`sphx_glr_tutorial_install.py` (``install.py``)                                     | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
-| :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)                             | 00:00.001 | 0.0 MB |
+| :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)   | 00:00.001 | 0.0 MB |
 +------------------------------------------------------------------------------------------+-----------+--------+
diff --git a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
index f1f45269b0..9084851a91 100644
--- a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
+++ b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
@@ -294,8 +294,8 @@ helper function to run a profile of the TVM generated code.
 
  .. code-block:: none
 
-    Numpy running time: 0.000012
-    naive: 0.000006
+    Numpy running time: 0.000008
+    naive: 0.000007
 
 
 
@@ -394,7 +394,7 @@ compile and run this new schedule with the parallel operation applied:
 
  .. code-block:: none
 
-    parallel: 0.000010
+    parallel: 0.000007
 
 
 
@@ -501,10 +501,10 @@ We can now compare the different schedules
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                   numpy    1.1799070000506617e-05                   1.0
-                   naive    5.833399999999999e-06    0.49439489720372287
-                parallel    1.0206899999999999e-05    0.8650597038208727
-                  vector             2.45952e-05       2.084503270083486
+                   numpy    7.690120000916067e-06                    1.0
+                   naive              6.7037e-06      0.8717289196009215
+                parallel    6.941300000000001e-06      0.902625706643477
+                  vector    2.4571799999999997e-05     3.195242726650942
 
 
 
@@ -925,7 +925,7 @@ matrix multiplication.
 
  .. code-block:: none
 
-    Numpy running time: 0.018890
+    Numpy running time: 0.018246
 
 
 
@@ -983,7 +983,7 @@ optimizations.
 
  .. code-block:: none
 
-    none: 3.273652
+    none: 3.275580
 
 
 
@@ -1086,7 +1086,7 @@ schedule.
 
  .. code-block:: none
 
-    blocking: 0.326503
+    blocking: 0.305448
 
 
 
@@ -1182,7 +1182,7 @@ already cache friendly from our previous optimizations.
 
  .. code-block:: none
 
-    vectorization: 0.352691
+    vectorization: 0.340588
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1256,7 +1256,7 @@ more cache friendly.
 
  .. code-block:: none
 
-    loop permutation: 0.126409
+    loop permutation: 0.112845
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1355,7 +1355,7 @@ optimized schedule.
 
  .. code-block:: none
 
-    array packing: 0.110843
+    array packing: 0.107526
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1448,7 +1448,7 @@ to `C` when all the block results are ready.
 
  .. code-block:: none
 
-    block caching: 0.112136
+    block caching: 0.111202
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1534,7 +1534,7 @@ of thread-level parallelization.
 
  .. code-block:: none
 
-    parallelization: 0.147541
+    parallelization: 0.146423
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1615,13 +1615,13 @@ working, we can compare the results.
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                    none      3.2736517458999996                     1.0
-                blocking            0.3265026695     0.09973653120217195
-           vectorization            0.3526911035      0.1077362929461629
-        loop permutation     0.12640937730000001    0.038614179855361266
-           array packing     0.11084349060000001    0.033859279851261845
-           block caching     0.11213572229999999     0.03425401692175763
-         parallelization     0.14754136820000002     0.04506935363078385
+                    none      3.2755796965000004                     1.0
+                blocking     0.30544793070000004     0.09325003785631446
+           vectorization             0.340588204     0.10397799338050695
+        loop permutation     0.11284525279999999     0.03445046778149731
+           array packing     0.10752643740000001    0.032826689429933095
+           block caching             0.111202406     0.03394892394736151
+         parallelization            0.1464231732     0.04470145341188159
 
 
 
diff --git a/docs/commit_hash b/docs/commit_hash
index 2715a591e7..e09958124b 100644
--- a/docs/commit_hash
+++ b/docs/commit_hash
@@ -1 +1 @@
-bac450a645c3f9e3a69a6f7af207cff462250bcf
+4f4b4edafdb0837972e4df6f570368f9f7cefd20
diff --git a/docs/contribute/index.html b/docs/contribute/index.html
index 090c528800..76b793b970 100644
--- a/docs/contribute/index.html
+++ b/docs/contribute/index.html
@@ -440,7 +440,7 @@ design choices of the internal.</p></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#prepare-the-release-candidate">Prepare the Release Candidate</a></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#prepare-the-gpg-key">Prepare the GPG Key</a></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#cut-a-release-candidate">Cut a Release Candidate</a></li>
-<li class="toctree-l2"><a class="reference internal" href="release_process.html#update-tvm-version-on-main">Update TVM Version on Main</a></li>
+<li class="toctree-l2"><a class="reference internal" href="release_process.html#update-tvm-version-on-main">Update TVM Version on <code class="docutils literal notranslate"><span class="pre">main</span></code></a></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#upload-the-release-candidate">Upload the Release Candidate</a></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#call-a-vote-on-the-release-candidate">Call a Vote on the Release Candidate</a></li>
 <li class="toctree-l2"><a class="reference internal" href="release_process.html#post-the-release">Post the Release</a></li>
diff --git a/docs/contribute/release_process.html b/docs/contribute/release_process.html
index f9f57c9f63..e583fd9aca 100644
--- a/docs/contribute/release_process.html
+++ b/docs/contribute/release_process.html
@@ -243,7 +243,7 @@
 <li class="toctree-l3"><a class="reference internal" href="#prepare-the-release-candidate">Prepare the Release Candidate</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#prepare-the-gpg-key">Prepare the GPG Key</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#cut-a-release-candidate">Cut a Release Candidate</a></li>
-<li class="toctree-l3"><a class="reference internal" href="#update-tvm-version-on-main">Update TVM Version on Main</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#update-tvm-version-on-main">Update TVM Version on <code class="docutils literal notranslate"><span class="pre">main</span></code></a></li>
 <li class="toctree-l3"><a class="reference internal" href="#upload-the-release-candidate">Upload the Release Candidate</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#call-a-vote-on-the-release-candidate">Call a Vote on the Release Candidate</a></li>
 <li class="toctree-l3"><a class="reference internal" href="#post-the-release">Post the Release</a></li>
@@ -374,7 +374,7 @@
 <li><p><a class="reference internal" href="#prepare-the-release-candidate" id="id3">Prepare the Release Candidate</a></p></li>
 <li><p><a class="reference internal" href="#prepare-the-gpg-key" id="id4">Prepare the GPG Key</a></p></li>
 <li><p><a class="reference internal" href="#cut-a-release-candidate" id="id5">Cut a Release Candidate</a></p></li>
-<li><p><a class="reference internal" href="#update-tvm-version-on-main" id="id6">Update TVM Version on Main</a></p></li>
+<li><p><a class="reference internal" href="#update-tvm-version-on-main" id="id6">Update TVM Version on <code class="docutils literal notranslate"><span class="pre">main</span></code></a></p></li>
 <li><p><a class="reference internal" href="#upload-the-release-candidate" id="id7">Upload the Release Candidate</a></p></li>
 <li><p><a class="reference internal" href="#call-a-vote-on-the-release-candidate" id="id8">Call a Vote on the Release Candidate</a></p></li>
 <li><p><a class="reference internal" href="#post-the-release" id="id9">Post the Release</a></p></li>
@@ -420,7 +420,7 @@
 <h2><a class="toc-backref" href="#id4">Prepare the GPG Key</a><a class="headerlink" href="#prepare-the-gpg-key" title="Permalink to this headline">¶</a></h2>
 <p>You can skip this section if you have already uploaded your key.</p>
 <p>After generating the gpg key, you need to upload your key to a public key server. Please refer to <a class="reference external" href="https://www.apache.org/dev/openpgp.html#generate-key">https://www.apache.org/dev/openpgp.html#generate-key</a> for details.</p>
-<p>If you want to do the release on another machine, you can transfer your gpg key to that machine via the <code class="code docutils literal notranslate"><span class="pre">gpg</span> <span class="pre">--export</span></code> and <code class="code docutils literal notranslate"><span class="pre">gpg</span> <span class="pre">--import</span></code> commands.</p>
+<p>If you want to do the release on another machine, you can transfer your gpg key to that machine via the <code class="docutils literal notranslate"><span class="pre">gpg</span> <span class="pre">--export</span></code> and <code class="docutils literal notranslate"><span class="pre">gpg</span> <span class="pre">--import</span></code> commands.</p>
 <p>The last step is to update the KEYS file with your code signing key <a class="reference external" href="https://www.apache.org/dev/openpgp.html#export-public-key">https://www.apache.org/dev/openpgp.html#export-public-key</a>. Check in the changes to the TVM main branch, as well as ASF SVN,</p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># the --depth=files will avoid checkout existing folders</span>
 svn co --depth<span class="o">=</span>files <span class="s2">&quot;https://dist.apache.org/repos/dist/dev/tvm&quot;</span> svn-tvm
@@ -443,16 +443,16 @@ git branch v0.6.0
 git push --set-upstream origin v0.6.0
 </pre></div>
 </div>
-<p>(<em>Make sure the version numbers in the source code are correct.</em> Run <code class="code docutils literal notranslate"><span class="pre">python3</span> <span class="pre">version.py</span></code> to update the version.)</p>
+<p>(<em>Make sure the version numbers in the source code are correct.</em> Run <code class="docutils literal notranslate"><span class="pre">python3</span> <span class="pre">version.py</span></code> to update the version.)</p>
 <p>Go to the GitHub repositories “releases” tab and click “Draft a new release”,</p>
 <ul class="simple">
-<li><p>Provide the release tag in the form of “v1.0.0.rc0” where 0 means it’s the first release candidate</p></li>
+<li><p>Provide the release tag in the form of <code class="docutils literal notranslate"><span class="pre">v1.0.0.rc0</span></code> where 0 means it’s the first release candidate. The tag must match this pattern <code class="docutils literal notranslate"><span class="pre">v[0-9]+\.[0-9]+\.[0-9]+\.rc[0-9]</span></code> exactly!</p></li>
 <li><p>Select the commit by clicking Target: branch &gt; Recent commits &gt; $commit_hash</p></li>
 <li><p>Copy and paste release note draft into the description box</p></li>
 <li><p>Select “This is a pre-release”</p></li>
 <li><p>Click “Publish release”</p></li>
 </ul>
-<p>Notice that one can still apply changes to the BRANCH after the cut, while the TAG is fixed. If any change is required for this release, a new TAG has to be created.</p>
+<p>Notice that one can still apply changes to the branch after the cut, while the tag is fixed. If any change is required for this release, a new tag has to be created.</p>
 <p>Remove previous release candidate (if applied),</p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git push --delete origin v0.6.0.rc1
 </pre></div>
@@ -483,10 +483,13 @@ shasum -a <span class="m">512</span> apache-tvm-src-v0.6.0.rc0.tar.gz &gt; apach
 </div>
 </div>
 <div class="section" id="update-tvm-version-on-main">
-<h2><a class="toc-backref" href="#id6">Update TVM Version on Main</a><a class="headerlink" href="#update-tvm-version-on-main" title="Permalink to this headline">¶</a></h2>
-<p>After cutting a release candidate, make sure to update the version numbers throughout <cite>main</cite>. For example if we are
-releasing <cite>v0.10.0</cite> we want to bump the version numbers throughout the codebase from <cite>v0.10.dev0</cite> to <cite>v0.11.dev0</cite>. An
-example of how to do this can be found here: <a class="reference external" href="https://github.com/apache/tvm/pull/12190">https://github.com/apache/tvm/pull/12190</a>.</p>
+<h2><a class="toc-backref" href="#id6">Update TVM Version on <code class="docutils literal notranslate"><span class="pre">main</span></code></a><a class="headerlink" href="#update-tvm-version-on-main" title="Permalink to this headline">¶</a></h2>
+<p>After cutting a release candidate, make sure to update the version numbers throughout <code class="docutils literal notranslate"><span class="pre">main</span></code>. For example if we are
+releasing <code class="docutils literal notranslate"><span class="pre">v0.10.0</span></code> we want to bump the version numbers throughout the codebase from <code class="docutils literal notranslate"><span class="pre">v0.10.dev0</span></code> to <code class="docutils literal notranslate"><span class="pre">v0.11.dev0</span></code>. An
+example of how to do this can be found here: <a class="reference external" href="https://github.com/apache/tvm/pull/12190">https://github.com/apache/tvm/pull/12190</a>.
+Tag the commit on <code class="docutils literal notranslate"><span class="pre">main</span></code> immediately after the last one included in the release branch with the dev tag (e.g. <code class="docutils literal notranslate"><span class="pre">v0.11.dev0</span></code>)
+for the next release. This tag is necessary so that the nightly packages built from <code class="docutils literal notranslate"><span class="pre">main</span></code> have the correct version
+number.</p>
 </div>
 <div class="section" id="upload-the-release-candidate">
 <h2><a class="toc-backref" href="#id7">Upload the Release Candidate</a><a class="headerlink" href="#upload-the-release-candidate" title="Permalink to this headline">¶</a></h2>
@@ -504,15 +507,15 @@ svn ci --username <span class="nv">$ASF_USERNAME</span> --password <span class="
 </div>
 <div class="section" id="call-a-vote-on-the-release-candidate">
 <h2><a class="toc-backref" href="#id8">Call a Vote on the Release Candidate</a><a class="headerlink" href="#call-a-vote-on-the-release-candidate" title="Permalink to this headline">¶</a></h2>
-<p>The first voting takes place on the Apache TVM developers list (<a class="reference external" href="mailto:dev&#37;&#52;&#48;tvm&#46;apache&#46;org">dev<span>&#64;</span>tvm<span>&#46;</span>apache<span>&#46;</span>org</a>). To get more attention, one can create a github issue start with “[VOTE]” instead, it will be mirrored to dev&#64; automatically. Look at past voting threads to see how this proceeds. The email should follow this format.</p>
+<p>The first voting takes place on the Apache TVM developers list (<a class="reference external" href="mailto:dev&#37;&#52;&#48;tvm&#46;apache&#46;org">dev<span>&#64;</span>tvm<span>&#46;</span>apache<span>&#46;</span>org</a>). To get more attention, one can create a GitHub issue start with “[VOTE]” instead, it will be mirrored to dev&#64; automatically. Look at past voting threads to see how this proceeds. The email should follow this format.</p>
 <ul class="simple">
 <li><p>Provide the link to the draft of the release notes in the email</p></li>
 <li><p>Provide the link to the release candidate artifacts</p></li>
 <li><p>Make sure the email is in text format and the links are correct</p></li>
 </ul>
 <p>For the dev&#64; vote, there must be at least 3 binding +1 votes and more +1 votes than -1 votes. Once the vote is done, you should also send out a summary email with the totals, with a subject that looks something like [VOTE][RESULT] ….</p>
-<p>In ASF, votes are open “at least” 72hrs (3 days). If you don’t get enough number of binding votes within that time, you cannot close the voting deadline. You need to extend it.</p>
-<p>If the voting fails, the community needs to modified the release accordingly, create a new release candidate and re-run the voting process.</p>
+<p>In ASF, votes are open at least 72 hours (3 days). If you don’t get enough number of binding votes within that time, you cannot close the voting deadline. You need to extend it.</p>
+<p>If the vote fails, the community needs to modify the release accordingly: create a new release candidate and re-run the voting process.</p>
 </div>
 <div class="section" id="post-the-release">
 <h2><a class="toc-backref" href="#id9">Post the Release</a><a class="headerlink" href="#post-the-release" title="Permalink to this headline">¶</a></h2>
@@ -536,7 +539,7 @@ curl <span class="s2">&quot;https://dist.apache.org/repos/dist/dev/tvm/KEYS&quot
 </div>
 <div class="section" id="update-the-tvm-website">
 <h2><a class="toc-backref" href="#id10">Update the TVM Website</a><a class="headerlink" href="#update-the-tvm-website" title="Permalink to this headline">¶</a></h2>
-<p>The website repository is located at <a class="reference external" href="https://github.com/apache/tvm-site">https://github.com/apache/tvm-site</a>. Modify the download page to include the release artifacts as well as the GPG signature and SHA hash. Since TVM’s docs are continually updated, upload a fixed version of the release docs. If CI has deleted the docs from the release by the time you go to update the website, you can restart the CI build for the release branch on Jenkins. See [...]
+<p>The website repository is located at <a class="reference external" href="https://github.com/apache/tvm-site">https://github.com/apache/tvm-site</a>. Modify the download page to include the release artifacts as well as the GPG signature and SHA hash. Since TVM’s docs are continually updated, upload a fixed version of the release docs. If CI has deleted the docs from the release by the time you go to update the website, you can restart the CI build for the release branch on Jenkins. See [...]
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git clone https://github.com/apache/tvm-site.git
 <span class="nb">pushd</span> tvm-site
 git checkout asf-site
diff --git a/docs/how_to/compile_models/from_darknet.html b/docs/how_to/compile_models/from_darknet.html
index be9116b2e4..e47f3c19a2 100644
--- a/docs/how_to/compile_models/from_darknet.html
+++ b/docs/how_to/compile_models/from_darknet.html
@@ -585,7 +585,7 @@ class:[&#39;truck 0.9266&#39;] left:471 top:83 right:689 bottom:169
 class:[&#39;bicycle 0.9984&#39;] left:111 top:113 right:577 bottom:447
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  13.891 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  11.203 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-darknet-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7716f96385bd5abb6e822041e285be54/from_darknet.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_darknet.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/from_keras.html b/docs/how_to/compile_models/from_keras.html
index 5ecd56a28d..84322b7e44 100644
--- a/docs/how_to/compile_models/from_keras.html
+++ b/docs/how_to/compile_models/from_keras.html
@@ -506,7 +506,7 @@ pip install -U tensorflow --user
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Relay top-1 id: 285, class name: Egyptian cat
 
 1/1 [==============================] - ETA: 0s
-1/1 [==============================] - 1s 995ms/step
+1/1 [==============================] - 1s 929ms/step
 Keras top-1 id: 285, class name: Egyptian cat
 </pre></div>
 </div>
diff --git a/docs/how_to/compile_models/from_mxnet.html b/docs/how_to/compile_models/from_mxnet.html
index b80c910c0a..76c54679df 100644
--- a/docs/how_to/compile_models/from_mxnet.html
+++ b/docs/how_to/compile_models/from_mxnet.html
@@ -440,7 +440,7 @@ to download the full example code</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;x&quot;</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#tuple" title="builtins.tuple" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">x</span><span class="o">.</span><span class="n">shape</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zipdd43512b-d571-453b-ae60-5a1e206a0d73 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+<img src="../../_images/sphx_glr_from_mxnet_001.png" srcset="../../_images/sphx_glr_from_mxnet_001.png" alt="from mxnet" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip30e813d4-e803-4991-a5d6-cbe7b7326dff from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
 x (1, 3, 224, 224)
 </pre></div>
 </div>
diff --git a/docs/how_to/compile_models/from_oneflow.html b/docs/how_to/compile_models/from_oneflow.html
index 86b7a17f61..69f87c4945 100644
--- a/docs/how_to/compile_models/from_oneflow.html
+++ b/docs/how_to/compile_models/from_oneflow.html
@@ -448,13 +448,13 @@ Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdo
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip&quot; to /workspace/.oneflow/flowvision_cache/resnet18.zip
 
   0%|          | 0.00/41.5M [00:00&lt;?, ?B/s]
- 15%|#5        | 6.33M/41.5M [00:00&lt;00:00, 55.5MB/s]
- 30%|###       | 12.6M/41.5M [00:00&lt;00:00, 60.8MB/s]
- 44%|####4     | 18.4M/41.5M [00:00&lt;00:00, 42.9MB/s]
- 55%|#####5    | 22.9M/41.5M [00:00&lt;00:00, 43.3MB/s]
- 66%|######5   | 27.3M/41.5M [00:00&lt;00:00, 40.6MB/s]
- 82%|########2 | 34.1M/41.5M [00:00&lt;00:00, 45.8MB/s]
-100%|##########| 41.5M/41.5M [00:00&lt;00:00, 50.2MB/s]
+ 15%|#5        | 6.33M/41.5M [00:00&lt;00:01, 31.2MB/s]
+ 22%|##2       | 9.30M/41.5M [00:00&lt;00:01, 30.5MB/s]
+ 39%|###8      | 16.0M/41.5M [00:00&lt;00:00, 36.2MB/s]
+ 58%|#####7    | 24.0M/41.5M [00:00&lt;00:00, 41.5MB/s]
+ 77%|#######7  | 32.0M/41.5M [00:00&lt;00:00, 49.7MB/s]
+ 92%|#########2| 38.3M/41.5M [00:00&lt;00:00, 45.3MB/s]
+100%|##########| 41.5M/41.5M [00:01&lt;00:00, 43.0MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_pytorch.html b/docs/how_to/compile_models/from_pytorch.html
index 68d0071068..0ae845a3b6 100644
--- a/docs/how_to/compile_models/from_pytorch.html
+++ b/docs/how_to/compile_models/from_pytorch.html
@@ -431,11 +431,12 @@ be unstable.</p>
 Downloading: &quot;https://download.pytorch.org/models/resnet18-f37072fd.pth&quot; to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
 
   0%|          | 0.00/44.7M [00:00&lt;?, ?B/s]
- 22%|##1       | 9.78M/44.7M [00:00&lt;00:00, 103MB/s]
- 44%|####3     | 19.6M/44.7M [00:00&lt;00:00, 82.4MB/s]
- 72%|#######1  | 32.0M/44.7M [00:00&lt;00:00, 93.3MB/s]
- 92%|#########1| 41.0M/44.7M [00:00&lt;00:00, 86.2MB/s]
-100%|##########| 44.7M/44.7M [00:00&lt;00:00, 91.4MB/s]
+ 18%|#7        | 7.99M/44.7M [00:00&lt;00:00, 45.6MB/s]
+ 36%|###5      | 16.0M/44.7M [00:00&lt;00:00, 48.6MB/s]
+ 54%|#####3    | 24.0M/44.7M [00:00&lt;00:00, 58.9MB/s]
+ 72%|#######1  | 32.0M/44.7M [00:00&lt;00:00, 55.0MB/s]
+ 89%|########8 | 39.7M/44.7M [00:00&lt;00:00, 61.9MB/s]
+100%|##########| 44.7M/44.7M [00:00&lt;00:00, 56.8MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_tensorflow.html b/docs/how_to/compile_models/from_tensorflow.html
index d3a2b9f13e..7c870d4295 100644
--- a/docs/how_to/compile_models/from_tensorflow.html
+++ b/docs/how_to/compile_models/from_tensorflow.html
@@ -645,7 +645,7 @@ banana (score = 0.00022)
 desk (score = 0.00019)
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  14.359 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  9.766 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-tensorflow-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7f1d3d1b878694c201c614c807cdebc8/from_tensorflow.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_tensorflow.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/sg_execution_times.html b/docs/how_to/compile_models/sg_execution_times.html
index e2c7f51829..ad348fba50 100644
--- a/docs/how_to/compile_models/sg_execution_times.html
+++ b/docs/how_to/compile_models/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-compile-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:57.860</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
+<p><strong>05:42.882</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 81%" />
@@ -348,44 +348,44 @@
 <col style="width: 8%" />
 </colgroup>
 <tbody>
-<tr class="row-odd"><td><p><a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></td>
-<td><p>01:14.359</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></td>
+<td><p>01:11.203</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></td>
-<td><p>01:13.891</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></td>
+<td><p>01:09.766</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></td>
-<td><p>00:47.706</p></td>
+<td><p>00:45.303</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></td>
-<td><p>00:33.305</p></td>
+<td><p>00:33.366</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></td>
-<td><p>00:31.147</p></td>
+<td><p>00:29.678</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></td>
-<td><p>00:28.093</p></td>
+<td><p>00:26.182</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></td>
-<td><p>00:25.135</p></td>
+<td><p>00:24.741</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></td>
-<td><p>00:23.170</p></td>
+<td><p>00:22.804</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></td>
-<td><p>00:18.628</p></td>
+<td><p>00:17.462</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></td>
-<td><p>00:02.425</p></td>
+<td><p>00:02.377</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/deploy_models/deploy_model_on_android.html b/docs/how_to/deploy_models/deploy_model_on_android.html
index d345cdde23..826b7537b4 100644
--- a/docs/how_to/deploy_models/deploy_model_on_android.html
+++ b/docs/how_to/deploy_models/deploy_model_on_android.html
@@ -662,7 +662,7 @@ to the remote android device.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  16.3089      16.2773      16.5331      16.2045       0.0956
+  16.0491      16.0210      16.2344      15.9366       0.1045
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
index d9cd674a37..8d9002839c 100644
--- a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
+++ b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
@@ -453,21 +453,20 @@ be unstable.</p>
 Downloading: &quot;https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth&quot; to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
 
   0%|          | 0.00/170M [00:00&lt;?, ?B/s]
-  7%|7         | 12.0M/170M [00:00&lt;00:01, 126MB/s]
- 14%|#4        | 24.1M/170M [00:00&lt;00:01, 107MB/s]
- 20%|##        | 34.5M/170M [00:00&lt;00:01, 92.1MB/s]
- 28%|##7       | 47.5M/170M [00:00&lt;00:01, 107MB/s]
- 34%|###4      | 58.0M/170M [00:00&lt;00:01, 104MB/s]
- 40%|####      | 68.2M/170M [00:00&lt;00:01, 101MB/s]
- 46%|####6     | 78.4M/170M [00:00&lt;00:00, 103MB/s]
- 52%|#####1    | 88.3M/170M [00:00&lt;00:00, 101MB/s]
- 58%|#####7    | 98.0M/170M [00:01&lt;00:00, 95.8MB/s]
- 64%|######4   | 109M/170M [00:01&lt;00:00, 103MB/s]
- 70%|#######   | 119M/170M [00:01&lt;00:00, 102MB/s]
- 76%|#######6  | 129M/170M [00:01&lt;00:00, 102MB/s]
- 82%|########1 | 139M/170M [00:01&lt;00:00, 93.9MB/s]
- 89%|########8 | 151M/170M [00:01&lt;00:00, 103MB/s]
- 95%|#########4| 161M/170M [00:01&lt;00:00, 102MB/s]
+  5%|4         | 7.99M/170M [00:00&lt;00:02, 64.4MB/s]
+ 14%|#4        | 24.0M/170M [00:00&lt;00:01, 103MB/s]
+ 24%|##3       | 40.0M/170M [00:00&lt;00:01, 112MB/s]
+ 33%|###2      | 55.8M/170M [00:00&lt;00:00, 130MB/s]
+ 40%|####      | 68.5M/170M [00:00&lt;00:00, 118MB/s]
+ 47%|####7     | 80.1M/170M [00:00&lt;00:00, 94.8MB/s]
+ 56%|#####5    | 94.9M/170M [00:00&lt;00:00, 110MB/s]
+ 63%|######2   | 106M/170M [00:01&lt;00:00, 101MB/s]
+ 69%|######8   | 117M/170M [00:01&lt;00:00, 101MB/s]
+ 75%|#######4  | 127M/170M [00:01&lt;00:00, 95.9MB/s]
+ 80%|########  | 136M/170M [00:01&lt;00:00, 83.1MB/s]
+ 88%|########7 | 149M/170M [00:01&lt;00:00, 95.0MB/s]
+ 93%|#########3| 158M/170M [00:01&lt;00:00, 95.5MB/s]
+100%|#########9| 169M/170M [00:01&lt;00:00, 101MB/s]
 100%|##########| 170M/170M [00:01&lt;00:00, 101MB/s]
 /venv/apache-tvm-py3.7/lib/python3.7/site-packages/torch/nn/functional.py:3897: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
   for i in range(dim)
@@ -566,7 +565,7 @@ torchvision rcnn models.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Get 9 valid boxes
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  27.056 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  11.260 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-object-detection-pytorch-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7795da4b258c8feff986668b95ef57ad/deploy_object_detection_pytorch.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_object_detection_pytorch.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized.html b/docs/how_to/deploy_models/deploy_prequantized.html
index fc4d92aa75..846f1b10a0 100644
--- a/docs/how_to/deploy_models/deploy_prequantized.html
+++ b/docs/how_to/deploy_models/deploy_prequantized.html
@@ -497,8 +497,8 @@ training. Other models require a full post training calibration.</p>
 Downloading: &quot;https://download.pytorch.org/models/mobilenet_v2-b0353104.pth&quot; to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
 
   0%|          | 0.00/13.6M [00:00&lt;?, ?B/s]
- 91%|######### | 12.3M/13.6M [00:00&lt;00:00, 129MB/s]
-100%|##########| 13.6M/13.6M [00:00&lt;00:00, 125MB/s]
+ 90%|########9 | 12.2M/13.6M [00:00&lt;00:00, 81.9MB/s]
+100%|##########| 13.6M/13.6M [00:00&lt;00:00, 87.5MB/s]
 </pre></div>
 </div>
 </div>
@@ -589,7 +589,7 @@ output values are identical out of 1000 outputs from mobilenet v2.</p>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  90.4693      90.3698      97.0182      90.1397       0.6845
+  90.1403      90.0314      93.9069      89.8491       0.4606
 </pre></div>
 </div>
 <div class="admonition note">
@@ -628,7 +628,7 @@ This includes support for the VNNI 8 bit dot product instruction (CascadeLake or
 <div class="section" id="deploy-a-quantized-tflite-model">
 <h2>Deploy a quantized TFLite Model<a class="headerlink" href="#deploy-a-quantized-tflite-model" title="Permalink to this headline">¶</a></h2>
 <p>TODO</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  16.350 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  5.223 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/fb8217c13f4351224c6cf3aacf1a87fc/deploy_prequantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized_tflite.html b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
index fc349c77e7..b7f241b46a 100644
--- a/docs/how_to/deploy_models/deploy_prequantized_tflite.html
+++ b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
@@ -582,7 +582,7 @@ TFLite Top-5 labels: [387 102 386 341 349]
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  121.7389     121.6763     123.5499     120.7672      0.4968
+  117.2063     117.1382     118.6567     115.6592      0.6188
 </pre></div>
 </div>
 <div class="admonition note">
@@ -610,7 +610,7 @@ network for ARM CPU</span></a>.</p></li>
 </ul>
 </div></blockquote>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  30.045 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  27.522 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-tflite-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/56691c7a27d45da61d112276334640d3/deploy_prequantized_tflite.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized_tflite.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_quantized.html b/docs/how_to/deploy_models/deploy_quantized.html
index 5b9ba9e8f0..bdae43e3f3 100644
--- a/docs/how_to/deploy_models/deploy_quantized.html
+++ b/docs/how_to/deploy_models/deploy_quantized.html
@@ -520,7 +520,7 @@ for calibration. But the accuracy might be impacted.</p>
   DeprecationWarning,
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  37.909 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  33.976 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-quantized-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7810ecf51bfc05f7d5e8a400ac3e815d/deploy_quantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_quantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
index 1feda1a777..c94b4f317d 100644
--- a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
+++ b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
@@ -462,24 +462,22 @@ to your device.</p>
 Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
 
   0%|          | 0/132723 [00:00&lt;?, ?KB/s]
-  3%|2         | 3325/132723 [00:00&lt;00:03, 33245.91KB/s]
-  8%|8         | 11015/132723 [00:00&lt;00:02, 58921.21KB/s]
- 13%|#2        | 16908/132723 [00:00&lt;00:01, 58822.34KB/s]
- 19%|#8        | 24725/132723 [00:00&lt;00:01, 66450.81KB/s]
- 25%|##4       | 32756/132723 [00:00&lt;00:01, 71438.14KB/s]
- 31%|###       | 40769/132723 [00:00&lt;00:01, 74389.66KB/s]
- 37%|###6      | 48797/132723 [00:00&lt;00:01, 76310.49KB/s]
- 43%|####2     | 56429/132723 [00:00&lt;00:01, 73766.86KB/s]
- 49%|####8     | 64476/132723 [00:00&lt;00:00, 75811.18KB/s]
- 55%|#####4    | 72473/132723 [00:01&lt;00:00, 77072.91KB/s]
- 61%|######    | 80540/132723 [00:01&lt;00:00, 78159.04KB/s]
- 67%|######6   | 88578/132723 [00:01&lt;00:00, 78826.03KB/s]
- 73%|#######2  | 96469/132723 [00:01&lt;00:00, 77646.95KB/s]
- 79%|#######8  | 104513/132723 [00:01&lt;00:00, 78476.18KB/s]
- 85%|########4 | 112563/132723 [00:01&lt;00:00, 79076.44KB/s]
- 91%|######### | 120571/132723 [00:01&lt;00:00, 79371.77KB/s]
- 97%|#########6| 128553/132723 [00:01&lt;00:00, 79504.81KB/s]
-100%|##########| 132723/132723 [00:01&lt;00:00, 75020.34KB/s]
+  4%|4         | 5865/132723 [00:00&lt;00:02, 58642.65KB/s]
+ 11%|#         | 14343/132723 [00:00&lt;00:01, 74015.15KB/s]
+ 17%|#7        | 22921/132723 [00:00&lt;00:01, 79384.37KB/s]
+ 24%|##3       | 31499/132723 [00:00&lt;00:01, 81906.82KB/s]
+ 30%|###       | 40181/132723 [00:00&lt;00:01, 83677.13KB/s]
+ 37%|###6      | 48729/132723 [00:00&lt;00:00, 84285.80KB/s]
+ 43%|####3     | 57389/132723 [00:00&lt;00:00, 85041.02KB/s]
+ 50%|####9     | 66044/132723 [00:00&lt;00:00, 85518.49KB/s]
+ 56%|#####6    | 74666/132723 [00:00&lt;00:00, 85731.42KB/s]
+ 63%|######2   | 83240/132723 [00:01&lt;00:00, 84079.34KB/s]
+ 69%|######9   | 91874/132723 [00:01&lt;00:00, 84758.89KB/s]
+ 76%|#######5  | 100356/132723 [00:01&lt;00:00, 78778.61KB/s]
+ 82%|########2 | 109027/132723 [00:01&lt;00:00, 81042.84KB/s]
+ 88%|########8 | 117203/132723 [00:01&lt;00:00, 65725.00KB/s]
+ 95%|#########4| 125791/132723 [00:01&lt;00:00, 70780.02KB/s]
+100%|##########| 132723/132723 [00:01&lt;00:00, 78266.41KB/s]
 </pre></div>
 </div>
 <p>Create TVM runtime and do inference
@@ -518,7 +516,7 @@ Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from h
 <span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" srcset="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" alt="deploy ssd gluoncv" class = "sphx-glr-single-img"/><p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 3 minutes  9.754 seconds)</p>
+<img src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" srcset="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" alt="deploy ssd gluoncv" class = "sphx-glr-single-img"/><p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  57.338 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-ssd-gluoncv-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/cccb17d28e5e8b2e94ea8cd5ec59f6ed/deploy_ssd_gluoncv.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_ssd_gluoncv.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/sg_execution_times.html b/docs/how_to/deploy_models/sg_execution_times.html
index b9bf42d2a3..6ab830a7e6 100644
--- a/docs/how_to/deploy_models/sg_execution_times.html
+++ b/docs/how_to/deploy_models/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-deploy-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>13:30.469</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
+<p><strong>12:40.661</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 86%" />
@@ -349,39 +349,39 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></td>
-<td><p>03:27.056</p></td>
+<td><p>03:11.260</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></td>
-<td><p>03:09.754</p></td>
+<td><p>02:57.338</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></td>
-<td><p>02:30.045</p></td>
+<td><p>02:27.522</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></td>
-<td><p>01:37.909</p></td>
+<td><p>01:33.976</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></td>
-<td><p>01:16.350</p></td>
+<td><p>01:05.223</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></td>
-<td><p>00:37.348</p></td>
+<td><p>00:35.402</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_model_on_nano.html#sphx-glr-how-to-deploy-models-deploy-model-on-nano-py"><span class="std std-ref">Deploy the Pretrained Model on Jetson Nano</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_nano.py</span></code>)</p></td>
-<td><p>00:26.329</p></td>
+<td><p>00:25.209</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></td>
-<td><p>00:25.671</p></td>
+<td><p>00:24.724</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></td>
-<td><p>00:00.007</p></td>
+<td><p>00:00.006</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/extend_tvm/bring_your_own_datatypes.html b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
index 647ac81679..22d29278a1 100644
--- a/docs/how_to/extend_tvm/bring_your_own_datatypes.html
+++ b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
@@ -621,7 +621,7 @@ In this alpha state of the Bring Your Own Datatypes framework, we have not imple
 <span class="n">module</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">params</span></a> <span class="o">=</span> <span class="n">get_mobilenet</span><span class="p">()</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip3d1403ad-5b47-4a55-84bd-6f0dae493bdf from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipee89f252-ece6-4f02-9764-1380ae97e3fb from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 </pre></div>
 </div>
 <p>It’s easy to execute MobileNet with native TVM:</p>
diff --git a/docs/how_to/extend_tvm/sg_execution_times.html b/docs/how_to/extend_tvm/sg_execution_times.html
index 5b3aff1ff0..a0df35d6d0 100644
--- a/docs/how_to/extend_tvm/sg_execution_times.html
+++ b/docs/how_to/extend_tvm/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-extend-tvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:49.285</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
+<p><strong>00:46.825</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -349,15 +349,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></td>
-<td><p>00:45.681</p></td>
+<td><p>00:43.462</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></td>
-<td><p>00:02.519</p></td>
+<td><p>00:02.347</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></td>
-<td><p>00:01.077</p></td>
+<td><p>00:01.007</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></td>
diff --git a/docs/how_to/extend_tvm/use_pass_instrument.html b/docs/how_to/extend_tvm/use_pass_instrument.html
index 20ba49e38b..fed5602bf1 100644
--- a/docs/how_to/extend_tvm/use_pass_instrument.html
+++ b/docs/how_to/extend_tvm/use_pass_instrument.html
@@ -525,10 +525,10 @@ profile the execution time of each passes.</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 7636us [7636us] (47.04%; 47.04%)
-FoldScaleAxis: 8598us [8us] (52.96%; 52.96%)
-        FoldConstant: 8589us [1731us] (52.91%; 99.91%)
-                InferType: 6858us [6858us] (42.25%; 79.84%)
+InferType: 7103us [7103us] (46.13%; 46.13%)
+FoldScaleAxis: 8295us [6us] (53.87%; 53.87%)
+        FoldConstant: 8289us [1675us] (53.83%; 99.93%)
+                InferType: 6614us [6614us] (42.95%; 79.80%)
 </pre></div>
 </div>
 </div>
@@ -550,10 +550,10 @@ Refer to following sections and <a class="reference internal" href="../../refere
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 7055us [7055us] (45.37%; 45.37%)
-FoldScaleAxis: 8496us [8us] (54.63%; 54.63%)
-        FoldConstant: 8489us [1739us] (54.58%; 99.91%)
-                InferType: 6749us [6749us] (43.40%; 79.51%)
+InferType: 6659us [6659us] (45.27%; 45.27%)
+FoldScaleAxis: 8050us [5us] (54.73%; 54.73%)
+        FoldConstant: 8045us [1685us] (54.69%; 99.94%)
+                InferType: 6360us [6360us] (43.24%; 79.06%)
 </pre></div>
 </div>
 <p>Register empty list to clear existing instruments.</p>
diff --git a/docs/how_to/optimize_operators/opt_conv_cuda.html b/docs/how_to/optimize_operators/opt_conv_cuda.html
index d0a448fe0b..add39942f1 100644
--- a/docs/how_to/optimize_operators/opt_conv_cuda.html
+++ b/docs/how_to/optimize_operators/opt_conv_cuda.html
@@ -577,7 +577,7 @@ latency of convolution.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Convolution: </span><span class="si">%f</span><span class="s2"> ms&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span> <span class="o">*</span> <span cl [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 34.209918 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 46.161792 ms
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-optimize-operators-opt-conv-cuda-py">
diff --git a/docs/how_to/optimize_operators/opt_conv_tensorcore.html b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
index fd4b229b49..80c0299eef 100644
--- a/docs/how_to/optimize_operators/opt_conv_tensorcore.html
+++ b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
@@ -916,7 +916,7 @@ be able to run on our build server</p>
     <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;conv2d with tensor core: </span><span class="si">%f</span><span class="s2"> ms&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span> <span class="o">* [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 13.365565 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 12.319292 ms
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/optimize_operators/opt_gemm.html b/docs/how_to/optimize_operators/opt_gemm.html
index 96fbedb456..55e5ecbd49 100644
--- a/docs/how_to/optimize_operators/opt_gemm.html
+++ b/docs/how_to/optimize_operators/opt_gemm.html
@@ -474,8 +474,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Baseline: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.020019
-Baseline: 3.320755
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018783
+Baseline: 3.524702
 </pre></div>
 </div>
 <p>In TVM, we can always inspect lower level IR to debug or optimize our schedule.
@@ -535,7 +535,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt1: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.330129
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.290261
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -602,7 +602,7 @@ vastly.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt2: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.355017
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.330394
 </pre></div>
 </div>
 <p>Here is the generated IR after vectorization.</p>
@@ -663,7 +663,7 @@ the access pattern for A matrix is more cache friendly.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt3: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.131375
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.114227
 </pre></div>
 </div>
 <p>Here is the generated IR after loop permutation.</p>
@@ -746,7 +746,7 @@ flattening.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt4: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.111834
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.108937
 </pre></div>
 </div>
 <p>Here is the generated IR after array packing.</p>
@@ -832,7 +832,7 @@ write to C when all the block results are ready.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt5: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">evaluator</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.112285
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.115411
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -922,7 +922,7 @@ write to C when all the block results are ready.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Opt6: </span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">opt6_time</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.148514
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.153141
 </pre></div>
 </div>
 <p>Here is the generated IR after parallelization.</p>
diff --git a/docs/how_to/optimize_operators/sg_execution_times.html b/docs/how_to/optimize_operators/sg_execution_times.html
index ee90bfc3a5..fee19a1797 100644
--- a/docs/how_to/optimize_operators/sg_execution_times.html
+++ b/docs/how_to/optimize_operators/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-optimize-operators-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:35.640</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
+<p><strong>00:35.473</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -349,15 +349,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></td>
-<td><p>00:33.094</p></td>
+<td><p>00:32.828</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></td>
-<td><p>00:01.489</p></td>
+<td><p>00:01.516</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></td>
-<td><p>00:01.057</p></td>
+<td><p>00:01.129</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
index 46645a5db9..eb42fb5e2e 100644
--- a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
+++ b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autoscheduler-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>09:29.896</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
+<p><strong>09:11.761</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 85%" />
@@ -349,27 +349,27 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></td>
-<td><p>05:46.298</p></td>
+<td><p>05:45.185</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></td>
-<td><p>01:35.123</p></td>
+<td><p>01:31.589</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></td>
-<td><p>01:04.854</p></td>
+<td><p>01:02.315</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></td>
-<td><p>00:39.686</p></td>
+<td><p>00:29.958</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></td>
-<td><p>00:12.324</p></td>
+<td><p>00:11.800</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></td>
-<td><p>00:11.611</p></td>
+<td><p>00:10.915</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
index b98eba3462..d4b469b549 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
@@ -1017,7 +1017,7 @@ cooperative fetching, unrolling and operator fusion.</p>
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.356 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.353 ms
 </pre></div>
 </div>
 </div>
@@ -1580,7 +1580,7 @@ In the example below we resume the status and do more 5 trials.</p>
 Get devices for measurement successfully!
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes  46.298 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 5 minutes  45.185 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e3e540f3b477c0c52d8eb73e674e8ffd/tune_conv2d_layer_cuda.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_conv2d_layer_cuda.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
index 1e042c3f5f..0fe47fd78b 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
@@ -915,7 +915,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-   8.1887       8.1887       8.1938       8.1837       0.0041
+   8.2103       8.2062       8.2239       8.2007       0.0099
 </pre></div>
 </div>
 </div>
@@ -937,7 +937,7 @@ to learn how to use the RPC Tracker and RPC Server.
 To use the RPC Tracker in auto-scheduler, replace the runner in <code class="code docutils literal notranslate"><span class="pre">TuningOptions</span></code>
 with <a class="reference internal" href="../../reference/api/python/auto_scheduler.html#tvm.auto_scheduler.RPCRunner" title="tvm.auto_scheduler.RPCRunner"><code class="xref any py py-class docutils literal notranslate"><span class="pre">auto_scheduler.RPCRunner</span></code></a>.</p></li>
 </ol>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  4.854 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  2.315 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-network-cuda-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/eafe360d52540634c9eea0fa89e804bd/tune_network_cuda.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_network_cuda.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
index 199216ebeb..edde1e746c 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
@@ -934,7 +934,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  765.1206     764.9800     765.5755     764.8065      0.3293
+  755.0235     755.1044     755.4110     754.5552      0.3540
 </pre></div>
 </div>
 </div>
@@ -956,7 +956,7 @@ to learn how to use the RPC Tracker and RPC Server.
 To use the RPC Tracker in auto-scheduler, replace the runner in <code class="code docutils literal notranslate"><span class="pre">TuningOptions</span></code>
 with <a class="reference internal" href="../../reference/api/python/auto_scheduler.html#tvm.auto_scheduler.RPCRunner" title="tvm.auto_scheduler.RPCRunner"><code class="xref any py py-class docutils literal notranslate"><span class="pre">auto_scheduler.RPCRunner</span></code></a>.</p></li>
 </ol>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  35.123 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  31.589 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-network-x86-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e416b94ca1090b0897c0f6e0df95b911/tune_network_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_network_x86.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
index 23392f3e99..6632df34d3 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
@@ -632,121 +632,76 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
              placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
              compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
   buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-  preflattened_buffer_map = {placeholder_7: placeholder_15: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_9: placeholder_16: Buffer(placeholder_14, float32, [128, 512], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_5: placeholder_18: Buffer(placeholder_10, float32, [128, 256], []), placeholder_8: placeholder_19: Buffer(placeholder_13, int32, [33], [])} {
+  preflattened_buffer_map = {placeholder_6: placeholder_15: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_7: placeholder_16: Buffer(placeholder_12, int32, [4916], []), placeholder_8: placeholder_17: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_18: Buffer(placeholder_10, float32, [128, 256], []), placeholder_9: placeholder_19: Buffer(placeholder_14, float32, [128, 512], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], [])} {
   for (i0.outer.i1.outer.fused: int32, 0, 256) &quot;parallel&quot; {
-    allocate(compute_4: Pointer(global float32), float32, [256]), storage_scope = global {
-      for (i.outer.inner: int32, 0, 4) {
-        for (nb_j.inner: int32, 0, 2) {
-          let cse_var_2: int32 = ((i.outer.inner*64) + (nb_j.inner*16))
-          let cse_var_1: int32 = ((floormod(i0.outer.i1.outer.fused, 16)*2) + nb_j.inner)
+    allocate(compute_4: Pointer(global float32), float32, [512]), storage_scope = global {
+      for (i.inner.init: int32, 0, 32) {
+        let cse_var_1: int32 = (i.inner.init*16)
+         {
+          compute_5: Buffer(compute_4, float32, [512], [])[cse_var_1] = 0f32
+          compute_5[(cse_var_1 + 1)] = 0f32
+          compute_5[(cse_var_1 + 2)] = 0f32
+          compute_5[(cse_var_1 + 3)] = 0f32
+          compute_5[(cse_var_1 + 4)] = 0f32
+          compute_5[(cse_var_1 + 5)] = 0f32
+          compute_5[(cse_var_1 + 6)] = 0f32
+          compute_5[(cse_var_1 + 7)] = 0f32
+          compute_5[(cse_var_1 + 8)] = 0f32
+          compute_5[(cse_var_1 + 9)] = 0f32
+          compute_5[(cse_var_1 + 10)] = 0f32
+          compute_5[(cse_var_1 + 11)] = 0f32
+          compute_5[(cse_var_1 + 12)] = 0f32
+          compute_5[(cse_var_1 + 13)] = 0f32
+          compute_5[(cse_var_1 + 14)] = 0f32
+          compute_5[(cse_var_1 + 15)] = 0f32
+        }
+      }
+      for (elem_idx: int32, 0, let cse_var_2: int32 = floordiv(floormod(i0.outer.i1.outer.fused, 64), 2) in (placeholder_3[(cse_var_2 + 1)] - placeholder_3[cse_var_2])) {
+        for (i.inner: int32, 0, 32) {
+          let cse_var_21: int32 = (i.inner*16)
+          let cse_var_20: int32 = (elem_idx*16)
+          let cse_var_19: int32 = floordiv(floormod(i0.outer.i1.outer.fused, 64), 2)
+          let cse_var_18: int32 = (cse_var_21 + 9)
+          let cse_var_17: int32 = (cse_var_21 + 8)
+          let cse_var_16: int32 = (cse_var_21 + 7)
+          let cse_var_15: int32 = (cse_var_21 + 6)
+          let cse_var_14: int32 = (cse_var_21 + 5)
+          let cse_var_13: int32 = (cse_var_21 + 4)
+          let cse_var_12: int32 = (cse_var_21 + 3)
+          let cse_var_11: int32 = (cse_var_21 + 2)
+          let cse_var_10: int32 = (cse_var_21 + 15)
+          let cse_var_9: int32 = (cse_var_21 + 14)
+          let cse_var_8: int32 = (cse_var_21 + 13)
+          let cse_var_7: int32 = (cse_var_21 + 12)
+          let cse_var_6: int32 = (cse_var_21 + 11)
+          let cse_var_5: int32 = (cse_var_21 + 10)
+          let cse_var_4: int32 = (cse_var_21 + 1)
+          let cse_var_3: int32 = ((floordiv(i0.outer.i1.outer.fused, 64)*8192) + (i.inner*256))
            {
-            compute_5: Buffer(compute_4, float32, [256], [])[cse_var_2] = 0f32
-            compute_5[(cse_var_2 + 1)] = 0f32
-            compute_5[(cse_var_2 + 2)] = 0f32
-            compute_5[(cse_var_2 + 3)] = 0f32
-            compute_5[(cse_var_2 + 4)] = 0f32
-            compute_5[(cse_var_2 + 5)] = 0f32
-            compute_5[(cse_var_2 + 6)] = 0f32
-            compute_5[(cse_var_2 + 7)] = 0f32
-            compute_5[(cse_var_2 + 8)] = 0f32
-            compute_5[(cse_var_2 + 9)] = 0f32
-            compute_5[(cse_var_2 + 10)] = 0f32
-            compute_5[(cse_var_2 + 11)] = 0f32
-            compute_5[(cse_var_2 + 12)] = 0f32
-            compute_5[(cse_var_2 + 13)] = 0f32
-            compute_5[(cse_var_2 + 14)] = 0f32
-            compute_5[(cse_var_2 + 15)] = 0f32
-            compute_5[(cse_var_2 + 32)] = 0f32
-            compute_5[(cse_var_2 + 33)] = 0f32
-            compute_5[(cse_var_2 + 34)] = 0f32
-            compute_5[(cse_var_2 + 35)] = 0f32
-            compute_5[(cse_var_2 + 36)] = 0f32
-            compute_5[(cse_var_2 + 37)] = 0f32
-            compute_5[(cse_var_2 + 38)] = 0f32
-            compute_5[(cse_var_2 + 39)] = 0f32
-            compute_5[(cse_var_2 + 40)] = 0f32
-            compute_5[(cse_var_2 + 41)] = 0f32
-            compute_5[(cse_var_2 + 42)] = 0f32
-            compute_5[(cse_var_2 + 43)] = 0f32
-            compute_5[(cse_var_2 + 44)] = 0f32
-            compute_5[(cse_var_2 + 45)] = 0f32
-            compute_5[(cse_var_2 + 46)] = 0f32
-            compute_5[(cse_var_2 + 47)] = 0f32
-            for (elem_idx: int32, 0, (placeholder_3[(cse_var_1 + 1)] - placeholder_3[cse_var_1])) {
-              let cse_var_35: int32 = (elem_idx*16)
-              let cse_var_34: int32 = (cse_var_2 + 9)
-              let cse_var_33: int32 = (cse_var_2 + 8)
-              let cse_var_32: int32 = (cse_var_2 + 7)
-              let cse_var_31: int32 = (cse_var_2 + 6)
-              let cse_var_30: int32 = (cse_var_2 + 5)
-              let cse_var_29: int32 = (cse_var_2 + 47)
-              let cse_var_28: int32 = (cse_var_2 + 46)
-              let cse_var_27: int32 = (cse_var_2 + 45)
-              let cse_var_26: int32 = (cse_var_2 + 44)
-              let cse_var_25: int32 = (cse_var_2 + 43)
-              let cse_var_24: int32 = (cse_var_2 + 42)
-              let cse_var_23: int32 = (cse_var_2 + 41)
-              let cse_var_22: int32 = (cse_var_2 + 40)
-              let cse_var_21: int32 = (cse_var_2 + 4)
-              let cse_var_20: int32 = (cse_var_2 + 39)
-              let cse_var_19: int32 = (cse_var_2 + 38)
-              let cse_var_18: int32 = (cse_var_2 + 37)
-              let cse_var_17: int32 = (cse_var_2 + 36)
-              let cse_var_16: int32 = (cse_var_2 + 35)
-              let cse_var_15: int32 = (cse_var_2 + 34)
-              let cse_var_14: int32 = (cse_var_2 + 33)
-              let cse_var_13: int32 = (cse_var_2 + 32)
-              let cse_var_12: int32 = (cse_var_2 + 3)
-              let cse_var_11: int32 = (cse_var_2 + 2)
-              let cse_var_10: int32 = (cse_var_2 + 15)
-              let cse_var_9: int32 = (cse_var_2 + 14)
-              let cse_var_8: int32 = (cse_var_2 + 13)
-              let cse_var_7: int32 = (cse_var_2 + 12)
-              let cse_var_6: int32 = (cse_var_2 + 11)
-              let cse_var_5: int32 = (cse_var_2 + 10)
-              let cse_var_4: int32 = (cse_var_2 + 1)
-              let cse_var_3: int32 = ((floordiv(i0.outer.i1.outer.fused, 16)*2048) + (i.outer.inner*512))
-               {
-                compute_5[cse_var_2] = (compute_5[cse_var_2] + (placeholder_1[((placeholder_3[cse_var_1]*16) + cse_var_35)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 1)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 2)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 3)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_21] = (compute_5[cse_var_21] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 4)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_30] = (compute_5[cse_var_30] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 5)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_31] = (compute_5[cse_var_31] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 6)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_32] = (compute_5[cse_var_32] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 7)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_33] = (compute_5[cse_var_33] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 8)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_34] = (compute_5[cse_var_34] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 9)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 10)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 11)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 12)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 13)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 14)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 15)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)])], 0f32)))
-                compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[((placeholder_3[cse_var_1]*16) + cse_var_35)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 1)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 2)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 3)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 4)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 5)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_19] = (compute_5[cse_var_19] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 6)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_20] = (compute_5[cse_var_20] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 7)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_22] = (compute_5[cse_var_22] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 8)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_23] = (compute_5[cse_var_23] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 9)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_24] = (compute_5[cse_var_24] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 10)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_25] = (compute_5[cse_var_25] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 11)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_26] = (compute_5[cse_var_26] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 12)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_27] = (compute_5[cse_var_27] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 13)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_28] = (compute_5[cse_var_28] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 14)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-                compute_5[cse_var_29] = (compute_5[cse_var_29] + (placeholder_1[(((placeholder_3[cse_var_1]*16) + cse_var_35) + 15)]*max(placeholder[((cse_var_3 + placeholder_2[(placeholder_3[cse_var_1] + elem_idx)]) + 256)], 0f32)))
-              }
-            }
+            compute_5[cse_var_21] = (compute_5[cse_var_21] + (placeholder_1[((placeholder_3[cse_var_19]*16) + cse_var_20)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_4] = (compute_5[cse_var_4] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 1)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_11] = (compute_5[cse_var_11] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 2)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_12] = (compute_5[cse_var_12] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 3)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_13] = (compute_5[cse_var_13] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 4)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_14] = (compute_5[cse_var_14] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 5)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_15] = (compute_5[cse_var_15] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 6)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_16] = (compute_5[cse_var_16] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 7)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_17] = (compute_5[cse_var_17] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 8)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_18] = (compute_5[cse_var_18] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 9)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_5] = (compute_5[cse_var_5] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 10)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_6] = (compute_5[cse_var_6] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 11)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_7] = (compute_5[cse_var_7] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 12)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_8] = (compute_5[cse_var_8] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 13)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_9] = (compute_5[cse_var_9] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 14)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
+            compute_5[cse_var_10] = (compute_5[cse_var_10] + (placeholder_1[(((placeholder_3[cse_var_19]*16) + cse_var_20) + 15)]*max(placeholder[(cse_var_3 + placeholder_2[(placeholder_3[cse_var_19] + elem_idx)])], 0f32)))
           }
         }
       }
-      for (i0.inner: int32, 0, 8) {
-        let cse_var_36: int32 = (((floordiv(i0.outer.i1.outer.fused, 16)*4096) + (i0.inner*512)) + (floormod(i0.outer.i1.outer.fused, 16)*32))
-        compute[ramp(cse_var_36, 1, 32)] = max((compute_5[ramp((i0.inner*32), 1, 32)] + placeholder_4[ramp(cse_var_36, 1, 32)]), broadcast(0f32, 32))
+      for (i0.inner: int32, 0, 32) {
+        let cse_var_23: int32 = floormod(i0.outer.i1.outer.fused, 64)
+        let cse_var_24: int32 = (cse_var_23*8)
+        let cse_var_22: int32 = (((floordiv(i0.outer.i1.outer.fused, 64)*16384) + (i0.inner*512)) + cse_var_24)
+        compute[ramp(cse_var_22, 1, 8)] = max((compute_5[ramp((((i0.inner*16) + cse_var_24) - (floordiv(cse_var_23, 2)*16)), 1, 8)] + placeholder_4[ramp(cse_var_22, 1, 8)]), broadcast(0f32, 8))
       }
     }
   }
@@ -784,7 +739,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 3.544 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 3.614 ms
 </pre></div>
 </div>
 <div class="admonition note">
diff --git a/docs/how_to/tune_with_autotvm/sg_execution_times.html b/docs/how_to/tune_with_autotvm/sg_execution_times.html
index 864df0afdd..96eff3bbe8 100644
--- a/docs/how_to/tune_with_autotvm/sg_execution_times.html
+++ b/docs/how_to/tune_with_autotvm/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:56.099</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
+<p><strong>00:39.161</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -349,22 +349,22 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></td>
-<td><p>00:56.062</p></td>
+<td><p>00:39.126</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></td>
-<td><p>00:00.021</p></td>
+<td><p>00:00.020</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></td>
 <td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></td>
 <td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></td>
 <td><p>00:00.005</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
diff --git a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
index 129b2d792d..dc2bc590a6 100644
--- a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
+++ b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
@@ -689,10 +689,9 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 1, 32]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 16, 8]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2821466
-No: 2   GFLOPS: 6.97/6.97       result: MeasureResult(costs=(0.03322158425,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.620582342147827, timestamp=1668549250.7313094)       [(&#39;tile_f&#39;, [-1, 2, 1, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 32, 1]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,8345205
-No: 3   GFLOPS: 12.53/12.53     result: MeasureResult(costs=(0.018477077333333335,), error_no=MeasureErrorNo.NO_ERROR, all_cost=10.270890235900879, timestamp=1668549252.4500968)       [(&#39;tile_f&#39;, [-1, 8, 8, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10272490
-No: 4   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 16, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 16, 4]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,3954303
+No: 2   GFLOPS: 154.18/154.18   result: MeasureResult(costs=(0.0015014968333333333,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.9308342933654785, timestamp=1668556463.7666306)      [(&#39;tile_f&#39;, [-1, 8, 8, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,264690
+No: 3   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -814,161 +813,132 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 16, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 16, 8]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6692404
-No: 5   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
-  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 738, in __call__
-    yield remote, remote.load_module(os.path.split(build_result.filename)[1])
-  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 702, in run_through_rpc
-    costs = time_f(*args).results
-  File &quot;/workspace/python/tvm/runtime/module.py&quot;, line 357, in evaluator
-    blob = feval(*args)
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 8, 1, 32]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 16]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,903288
+No: 4   GFLOPS: 150.94/154.18   result: MeasureResult(costs=(0.0015337318636363638,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.028738498687744, timestamp=1668556465.3167734)       [(&#39;tile_f&#39;, [-1, 1, 2, 2]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 8]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4552304
+No: 5   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
+  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
+    func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
+  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
+    func = build(s, args, target_host=task.target_host, runtime=runtime)
+  File &quot;/workspace/python/tvm/driver/build_module.py&quot;, line 227, in build
+    input_mod = lower(inputs, args, name=name, binds=binds)
+  File &quot;/workspace/python/tvm/driver/build_module.py&quot;, line 134, in lower
+    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
-  File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 262, in tvm._ffi._cy3.core.FuncCall
-  File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 251, in tvm._ffi._cy3.core.FuncCall3
+  File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 276, in tvm._ffi._cy3.core.FuncCall
   File &quot;tvm/_ffi/_cython/./base.pxi&quot;, line 181, in tvm._ffi._cy3.core.CHECK_CALL
 tvm._ffi.base.TVMError: Traceback (most recent call last):
-  4: TVMFuncCall
+  24: TVMFuncCall
         at ../src/runtime/c_runtime_api.cc:477
-  3: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
         at ../include/tvm/runtime/packed_func.h:1217
-  2: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
-        at ../src/runtime/rpc/rpc_module.cc:129
-  1: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function&lt;void (tvm::runtime::TVMArgs)&gt; const&amp;)
-        at ../src/runtime/rpc/rpc_endpoint.cc:1012
-  0: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function&lt;void (tvm::runtime::TVMArgs)&gt;)
-        at ../src/runtime/rpc/rpc_endpoint.cc:804
-  File &quot;../src/runtime/rpc/rpc_endpoint.cc&quot;, line 804
-TVMError:
----------------------------------------------------------------
-An error occurred during the execution of TVM.
-For more information, please see: https://tvm.apache.org/docs/errors.html
----------------------------------------------------------------
-  Check failed: (code == RPCCode::kReturn) is false: code=kShutdown
-
-During handling of the above exception, another exception occurred:
-
-Traceback (most recent call last):
-  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 702, in run_through_rpc
-    costs = time_f(*args).results
-  File &quot;/usr/lib/python3.7/contextlib.py&quot;, line 130, in __exit__
-    self.gen.throw(type, value, traceback)
-  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 742, in __call__
-    remote.remove(build_result.filename)
-  File &quot;/workspace/python/tvm/rpc/client.py&quot;, line 144, in remove
-    self._remote_funcs[&quot;remove&quot;] = self.get_function(&quot;tvm.rpc.server.remove&quot;)
-  File &quot;/workspace/python/tvm/rpc/client.py&quot;, line 72, in get_function
-    return self._sess.get_function(name)
-  File &quot;/workspace/python/tvm/runtime/module.py&quot;, line 171, in get_function
-    self.handle, c_str(name), ctypes.c_int(query_imports), ctypes.byref(ret_handle)
-  File &quot;/workspace/python/tvm/_ffi/base.py&quot;, line 348, in check_call
-    raise get_last_ffi_error()
-tvm._ffi.base.TVMError: Traceback (most recent call last):
-  52: 0xffffffffffffffff
-  51: _start
-  50: __libc_start_main
-  49: _Py_UnixMain
-  48: 0x0000000000650da0
-  47: 0x0000000000650afa
-  46: _PyFunction_FastCallDict
-  45: _PyEval_EvalCodeWithName
-  44: _PyEval_EvalFrameDefault
-  43: _PyFunction_FastCallKeywords
-  42: _PyEval_EvalCodeWithName
-  41: _PyEval_EvalFrameDefault
-  40: _PyMethodDef_RawFastCallKeywords
-  39: 0x0000000000546369
-  38: _PyEval_EvalCodeWithName
-  37: _PyEval_EvalFrameDefault
-  36: _PyFunction_FastCallKeywords
-  35: _PyEval_EvalCodeWithName
-  34: _PyEval_EvalFrameDefault
-  33: _PyFunction_FastCallDict
-  32: _PyEval_EvalCodeWithName
-  31: _PyEval_EvalFrameDefault
-  30: _PyObject_FastCallDict
-  29: 0x00000000004c06e1
-  28: _PyFunction_FastCallDict
-  27: _PyEval_EvalFrameDefault
-  26: _PyMethodDescr_FastCallKeywords
-  25: 0x00000000005dcb58
-  24: 0x00000000005dc83f
-  23: 0x00000000004ba127
-  22: _PyEval_EvalFrameDefault
-  21: _PyFunction_FastCallKeywords
-  20: _PyEval_EvalFrameDefault
-  19: _PyFunction_FastCallKeywords
-  18: _PyEval_EvalFrameDefault
-  17: _PyFunction_FastCallKeywords
-  16: _PyEval_EvalCodeWithName
-  15: _PyEval_EvalFrameDefault
-  14: 0x0000000000537c30
-  13: _PyObject_FastCallKeywords
-  12: 0x00007fe2cb1f3fa2
-  11: _ctypes_callproc
-  10: ffi_call
-  9: ffi_call_unix64
-  8: TVMModGetFunction
-        at ../src/runtime/c_runtime_api.cc:408
-  7: tvm::runtime::ModuleNode::GetFunction(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, bool)
-        at ../src/runtime/module.cc:66
-  6: tvm::runtime::RPCModuleNode::GetFunction(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, tvm::runtime::ObjectPtr&lt;tvm::runtime::Object&gt; const&amp;)
-        at ../src/runtime/rpc/rpc_module.cc:185
-  5: tvm::runtime::RPCClientSession::GetFunction(std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;)
-        at ../src/runtime/rpc/rpc_endpoint.cc:1007
-  4: tvm::runtime::TVMRetValue tvm::runtime::RPCEndpoint::SysCallRemote&lt;std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;&gt;(tvm::runtime::RPCCode, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;)
-        at ../src/runtime/rpc/rpc_endpoint.h:223
-  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()&lt;int, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;&gt;(int&amp;&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;) const
+  22: Call
+        at ../include/tvm/runtime/packed_func.h:1213
+  21: operator()
+        at ../include/tvm/runtime/packed_func.h:1731
+  20: unpack_call&lt;tvm::IRModule, 5, tvm::&lt;lambda(tvm::te::Schedule, const tvm::runtime::Array&lt;tvm::runtime::ObjectRef&gt;&amp;, const tvm::runtime::String&amp;, const tvm::runtime::Map&lt;tvm::te::Tensor, tvm::tir::Buffer&gt;&amp;, bool)&gt; &gt;
+        at ../include/tvm/runtime/packed_func.h:1671
+  19: run&lt;&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  18: run&lt;tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  17: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  16: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  15: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  14: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1646
+  13: operator()
+        at ../src/driver/driver_api.cc:388
+  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array&lt;tvm::runtime::ObjectRef, void&gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::unordered_map&lt;tvm::te::Tensor, tvm::tir::Buffer, std::hash&lt;tvm::te::Tensor&gt;, std::equal_to&lt;tvm::te::Tensor&gt;, std::allocator&lt;std::pair&lt;tvm::te::Tensor const, tvm::tir::Buffer&gt; &gt; &gt; const&amp;, tvm::GlobalVarSupply, bool)
+        at ../src/driver/driver_api.cc:374
+  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array&lt;tvm::transform::Pass, void&gt;)
+        at ../src/driver/driver_api.cc:269
+  10: tvm::transform::Pass::operator()(tvm::IRModule) const
+        at ../src/ir/transform.cc:258
+  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:274
+  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:453
+  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:274
+  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/tir/ir/transform.cc:100
+  5: tvm::runtime::TypedPackedFunc&lt;tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)&gt;::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
+        at ../include/tvm/runtime/packed_func.h:1750
+  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher&lt;tvm::tir::PrimFunc&gt;::run&lt;tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext&gt;(tvm::runtime::PackedFunc const&amp;, tvm::tir::PrimFunc&amp;&amp;, tvm::IRModule&amp;&amp;, tvm::transform::PassContext&amp;&amp;)
+        at ../include/tvm/runtime/packed_func.h:1694
+  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()&lt;tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext&gt;(tvm::tir::PrimFunc&amp;&amp;, tvm::IRModule&amp;&amp;, tvm::transform::PassContext&amp;&amp;) const
         at ../include/tvm/runtime/packed_func.h:1618
   2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
         at ../include/tvm/runtime/packed_func.h:1217
   1: Call
         at ../include/tvm/runtime/packed_func.h:1213
   0: operator()
-        at ../src/runtime/rpc/rpc_endpoint.cc:684
-  File &quot;../src/runtime/rpc/rpc_endpoint.cc&quot;, line 684
-TVMError:
----------------------------------------------------------------
-An error occurred during the execution of TVM.
-For more information, please see: https://tvm.apache.org/docs/errors.html
----------------------------------------------------------------
-  Check failed: (code == RPCCode::kReturn) is false: code=1
+        at ../src/runtime/c_runtime_api.cc:534
+  File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
+  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
+    raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel
 
 Traceback (most recent call last):
-  52: 0xffffffffffffffff
-  51: _start
-  50: __libc_start_main
-  49: _Py_UnixMain
-  48: 0x0000000000650da0
-  47: 0x0000000000650afa
-  46: _PyFunction_FastCallDict
-  45: _PyEval_EvalCodeWithName
-  44: _PyEval_EvalFrameDefault
-  43: _PyFunction_FastCallKeywords
-  42: _PyEval_EvalCodeWithName
-  41: _PyEval_EvalFrameDefault
-  40: _PyMethodDef_RawFastCallKeywords
-  39: 0x0000000000546369
-  38: _PyEval_EvalCodeWithName
-  37: _PyEval_EvalFrameDefault
-  36: _PyFunction_FastCallKeywords
-  35: _PyEval_EvalCodeWithName
-  34: _PyEval_EvalFrameDefault
-  33: _PyFunction_FastCallDict
-  32: _PyEval_EvalCodeWithName
-  31: _PyEval_EvalFrameDefault
-  30: _PyObject_FastCallDict
-  29: 0x00000000004c06e1
-  28: _PyFunction_FastCallDict
-  27: _PyEval_EvalFrameDefault
-  26: _PyMethodDescr_FastCallKeywords
-  25: 0x00000000005dcb58
-  24: 0x00000000005dc83f
-  23: 0x00000000004ba127
-  22: _PyEval_EvalFrameDefault
-  21: _PyFunction_FastCallKeywords
-  20: _PyEval_EvalFrameDefault
-  19: _PyFunction_FastCall      [(&#39;tile_f&#39;, [-1, 8, 1, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 8, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6790027
-No: 6   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+  24: TVMFuncCall
+        at ../src/runtime/c_runtime_api.cc:477
+  23: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+        at ../include/tvm/runtime/packed_func.h:1217
+  22: Call
+        at ../include/tvm/runtime/packed_func.h:1213
+  21: operator()
+        at ../include/tvm/runtime/packed_func.h:1731
+  20: unpack_call&lt;tvm::IRModule, 5, tvm::&lt;lambda(tvm::te::Schedule, const tvm::runtime::Array&lt;tvm::runtime::ObjectRef&gt;&amp;, const tvm::runtime::String&amp;, const tvm::runtime::Map&lt;tvm::te::Tensor, tvm::tir::Buffer&gt;&amp;, bool)&gt; &gt;
+        at ../include/tvm/runtime/packed_func.h:1671
+  19: run&lt;&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  18: run&lt;tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  17: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  16: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  15: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1631
+  14: run&lt;tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_, tvm::runtime::TVMMovableArgValueWithContext_&gt;
+        at ../include/tvm/runtime/packed_func.h:1646
+  13: operator()
+        at ../src/driver/driver_api.cc:388
+  12: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array&lt;tvm::runtime::ObjectRef, void&gt; const&amp;, std::__cxx11::basic_string&lt;char, std::char_traits&lt;char&gt;, std::allocator&lt;char&gt; &gt; const&amp;, std::unordered_map&lt;tvm::te::Tensor, tvm::tir::Buffer, std::hash&lt;tvm::te::Tensor&gt;, std::equal_to&lt;tvm::te::Tensor&gt;, std::allocator&lt;std::pair&lt;tvm::te::Tensor const, tvm::tir::Buffer&gt; &gt; &gt; const&amp;, tvm::GlobalVarSupply, bool)
+        at ../src/driver/driver_api.cc:374
+  11: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array&lt;tvm::transform::Pass, void&gt;)
+        at ../src/driver/driver_api.cc:269
+  10: tvm::transform::Pass::operator()(tvm::IRModule) const
+        at ../src/ir/transform.cc:258
+  9: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:274
+  8: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:453
+  7: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/ir/transform.cc:274
+  6: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&amp;) const
+        at ../src/tir/ir/transform.cc:100
+  5: tvm::runtime::TypedPackedFunc&lt;tvm::tir::PrimFunc (tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext)&gt;::operator()(tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext) const
+        at ../include/tvm/runtime/packed_func.h:1750
+  4: tvm::tir::PrimFunc tvm::runtime::detail::typed_packed_call_dispatcher&lt;tvm::tir::PrimFunc&gt;::run&lt;tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext&gt;(tvm::runtime::PackedFunc const&amp;, tvm::tir::PrimFunc&amp;&amp;, tvm::IRModule&amp;&amp;, tvm::transform::PassContext&amp;&amp;)
+        at ../include/tvm/runtime/packed_func.h:1694
+  3: tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()&lt;tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext&gt;(tvm::tir::PrimFunc&amp;&amp;, tvm::IRModule&amp;&amp;, tvm::transform::PassContext&amp;&amp;) const
+        at ../include/tvm/runtime/packed_func.h:1618
+  2: tvm::runtime::PackedFuncObj::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
+        at ../include/tvm/runtime/packed_func.h:1217
+  1: Call
+        at ../include/tvm/runtime/packed_func.h:1213
+  0: operator()
+        at ../src/runtime/c_runtime_api.cc:534
+  File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
+  File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
+    raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 8, 4, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 256]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,8512482
+No: 6   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1090,8 +1060,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 16, 32, 1]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 2, 32]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2083444
-No: 7   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 256, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 8]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9782088
+No: 7   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1213,8 +1183,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 1, 32]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 64]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,7327507
-No: 8   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 32, 16, 1]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 4]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,5304899
+No: 8   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1336,8 +1306,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 4, 16]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 128]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10433677
-No: 9   GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 8, 2]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 16, 16]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4587081
+No: 9   GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1459,8 +1429,10 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 256, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 32]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,731288
-No: 10  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 32, 2]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 64]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,8876210
+No: 10  GFLOPS: 6.57/154.18     result: MeasureResult(costs=(0.035245632,), error_no=MeasureErrorNo.NO_ERROR, all_cost=7.224449396133423, timestamp=1668556474.7894113) [(&#39;tile_f&#39;, [-1, 2, 1, 32]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 8, 8]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,1850606
+No: 11  GFLOPS: 17.15/154.18    result: MeasureResult(costs=(0.013495552,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.2125194072723389, timestamp=1668556475.4980066)        [(&#39;tile_f&#39;, [-1, 2, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 16]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2455000
+No: 12  GFLOPS: 0.00/154.18     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1582,9 +1554,11 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 4, 2]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 128]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,1338992
-No: 11  GFLOPS: 5.38/12.53      result: MeasureResult(costs=(0.04300253675,), error_no=MeasureErrorNo.NO_ERROR, all_cost=3.1535797119140625, timestamp=1668549263.3956833)      [(&#39;tile_f&#39;, [-1, 32, 1, 2]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2139120
-No: 12  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 128, 2]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 2, 32]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,1696078
+No: 13  GFLOPS: 174.33/174.33   result: MeasureResult(costs=(0.001327968394736842,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.6654844284057617, timestamp=1668556479.535505)        [(&#39;tile_f&#39;, [-1, 1, 8, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 2]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,8171801
+No: 14  GFLOPS: 3.08/174.33     result: MeasureResult(costs=(0.07524625374999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=4.3694047927856445, timestamp=1668556480.8263621)        [(&#39;tile_f&#39;, [-1, 128, 2, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 1, 4]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,1811937
+No: 15  GFLOPS: 21.73/174.33    result: MeasureResult(costs=(0.0106537506,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.669083833694458, timestamp=1668556481.527801) [(&#39;tile_f&#39;, [-1, 8, 2, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 16]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,7286413
+No: 16  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1706,8 +1680,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 8, 16, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 32]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4981589
-No: 13  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 8, 8, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 32, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10313724
+No: 17  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1829,9 +1803,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 8, 64]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 32, 2]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10120209
-No: 14  GFLOPS: 3.18/12.53      result: MeasureResult(costs=(0.072806816,), error_no=MeasureErrorNo.NO_ERROR, all_cost=9.520730257034302, timestamp=1668549273.1264057) [(&#39;tile_f&#39;, [-1, 32, 4, 4]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 1]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9878560
-No: 15  GFLOPS: 0.00/12.53      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 1, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 32, 4]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2991064
+No: 18  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -1953,9 +1926,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 64, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 64]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,7912545
-No: 16  GFLOPS: 18.76/18.76     result: MeasureResult(costs=(0.012339853888888887,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.854473352432251, timestamp=1668549273.8730075)        [(&#39;tile_f&#39;, [-1, 32, 2, 4]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4268553
-No: 17  GFLOPS: 0.00/18.76      result: Traceback (most recent call last):
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 256, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 2, 8]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4940373
+No: 19  GFLOPS: 0.00/174.33     result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 588, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 540, in _build_func_common
@@ -2077,10 +2049,8 @@ Traceback (most recent call last):
   File &quot;tvm/_ffi/_cython/./packed_func.pxi&quot;, line 56, in tvm._ffi._cy3.core.tvm_callback
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 871, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
-tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 8, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 128, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,449582
-No: 18  GFLOPS: 262.69/262.69   result: MeasureResult(costs=(0.0008812803908045978,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.1055493354797363, timestamp=1668549280.0131285)      [(&#39;tile_f&#39;, [-1, 2, 16, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,8983955
-No: 19  GFLOPS: 70.02/262.69    result: MeasureResult(costs=(0.0033061094166666667,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.018292188644409, timestamp=1668549280.9508169)       [(&#39;tile_f&#39;, [-1, 1, 4, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 16, 4]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,1824415
-No: 20  GFLOPS: 82.93/262.69    result: MeasureResult(costs=(0.002791370583333333,), error_no=MeasureErrorNo.NO_ERROR, all_cost=5.6083526611328125, timestamp=1668549281.647997)        [(&#39;tile_f&#39;, [-1, 8, 2, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 1, 2]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9717133
+tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 16, 8, 2]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 128, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,5639343
+No: 20  GFLOPS: 195.91/195.91   result: MeasureResult(costs=(0.0011816482499999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=3.031599521636963, timestamp=1668556484.7983866)       [(&#39;tile_f&#39;, [-1, 2, 4, 2]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 8]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9973113
 </pre></div>
 </div>
 <p>Finally we can inspect the best config from log file, check correctness,
@@ -2119,9 +2089,9 @@ and measure running time.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Finish loading 20 records
 
 Best config:
-[(&#39;tile_f&#39;, [-1, 2, 16, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,8983955
+[(&#39;tile_f&#39;, [-1, 2, 4, 2]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 2, 8]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9973113
 Finish loading 20 records
-Time cost of this operator: 0.001286
+Time cost of this operator: 0.001614
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autotvm-tune-conv2d-cuda-py">
diff --git a/docs/how_to/work_with_microtvm/micro_autotune.html b/docs/how_to/work_with_microtvm/micro_autotune.html
index 0eed7118ef..e55e2277e5 100644
--- a/docs/how_to/work_with_microtvm/micro_autotune.html
+++ b/docs/how_to/work_with_microtvm/micro_autotune.html
@@ -596,10 +596,10 @@ the tuned operator.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>########## Build without Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)
 ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  309.4     98.723   (1, 2, 10, 10, 3)  2       1        [309.4]
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.048     0.973    (1, 6, 10, 10)     1       1        [3.048]
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.955     0.305    (1, 1, 10, 10, 3)  1       1        [0.955]
-Total_time                                    -                                             313.403   -        -                  -       -        -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  312.3     98.715   (1, 2, 10, 10, 3)  2       1        [312.3]
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.073     0.971    (1, 6, 10, 10)     1       1        [3.073]
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.991     0.313    (1, 1, 10, 10, 3)  1       1        [0.991]
+Total_time                                    -                                             316.364   -        -                  -       -        -
 </pre></div>
 </div>
 </div>
@@ -650,10 +650,10 @@ Total_time                                    -
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>########## Build with Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  Measurements(us)
 ---------                                     ---                                           --------  -------  -----              ------  -------  ----------------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  101.1     97.398   (1, 6, 10, 10, 1)  2       1        [101.1]
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.745     1.681    (1, 6, 10, 10)     1       1        [1.745]
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.956     0.921    (1, 1, 10, 10, 3)  1       1        [0.956]
-Total_time                                    -                                             103.801   -        -                  -       -        -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  103.1     97.424   (1, 6, 10, 10, 1)  2       1        [103.1]
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.773     1.675    (1, 6, 10, 10)     1       1        [1.773]
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.953     0.901    (1, 1, 10, 10, 3)  1       1        [0.953]
+Total_time                                    -                                             105.826   -        -                  -       -        -
 </pre></div>
 </div>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-autotune-py">
diff --git a/docs/how_to/work_with_microtvm/micro_pytorch.html b/docs/how_to/work_with_microtvm/micro_pytorch.html
index 8b8af554b3..cc6cfa2002 100644
--- a/docs/how_to/work_with_microtvm/micro_pytorch.html
+++ b/docs/how_to/work_with_microtvm/micro_pytorch.html
@@ -440,7 +440,7 @@ download a cat image and preprocess it to use as the model input.</p>
 Downloading: &quot;https://download.pytorch.org/models/quantized/mobilenet_v2_qnnpack_37f702c5.pth&quot; to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2_qnnpack_37f702c5.pth
 
   0%|          | 0.00/3.42M [00:00&lt;?, ?B/s]
-100%|##########| 3.42M/3.42M [00:00&lt;00:00, 73.3MB/s]
+100%|##########| 3.42M/3.42M [00:00&lt;00:00, 50.4MB/s]
 /workspace/python/tvm/relay/frontend/pytorch_utils.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
   return LooseVersion(torch_ver) &gt; ver
 /venv/apache-tvm-py3.7/lib/python3.7/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
@@ -564,7 +564,7 @@ via the host <cite>main.cc`</cite> or if a Zephyr emulated board is selected as
 Torch top-1 id: 282, class name: tiger cat
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  5.538 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  1.655 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-pytorch-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/12b9ecc04c41abaa12022061771821d1/micro_pytorch.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">micro_pytorch.py</span></code></a></p>
diff --git a/docs/how_to/work_with_microtvm/micro_train.html b/docs/how_to/work_with_microtvm/micro_train.html
index 3787afbfd4..c977250cb5 100644
--- a/docs/how_to/work_with_microtvm/micro_train.html
+++ b/docs/how_to/work_with_microtvm/micro_train.html
@@ -530,7 +530,7 @@ take about <strong>2 minutes</strong> to download the Stanford Cars, while COCO
 <a href="https://docs.python.org/3/library/shutil.html#shutil.move" title="shutil.move" class="sphx-glr-backref-module-shutil sphx-glr-backref-type-py-function"><span class="n">shutil</span><span class="o">.</span><span class="n">move</span></a><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><a href="https://docs.python.org/3/library/stdtypes.html#str" title="builtins.str" class="sphx-glr-backref-module-builtins sphx-glr-backref-typ [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&#39;/tmp/tmpjf6_zkcp/images/random&#39;
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&#39;/tmp/tmp6_yzhb7f/images/random&#39;
 </pre></div>
 </div>
 </div>
@@ -590,8 +590,8 @@ objects to other stuff? We can display some examples from our datasets using <co
     <span class="n">plt</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s2">&quot;off&quot;</span><span class="p">)</span>
 </pre></div>
 </div>
-<img src="../../_images/sphx_glr_micro_train_001.png" srcset="../../_images/sphx_glr_micro_train_001.png" alt="[0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 0.0]" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/tmp/tmpjf6_zkcp/images/target contains 8144 images
-/tmp/tmpjf6_zkcp/images/random contains 5000 images
+<img src="../../_images/sphx_glr_micro_train_001.png" srcset="../../_images/sphx_glr_micro_train_001.png" alt="[0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [1.0, 0.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0]" class = "sphx-glr-single-img"/><div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/tmp/tmp6_yzhb7f/images/target contains 8144 images
+/tmp/tmp6_yzhb7f/images/random contains 5000 images
 </pre></div>
 </div>
 </div>
@@ -703,13 +703,13 @@ the time on our validation set).</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Epoch 1/3
-328/328 - 47s - loss: 0.2144 - accuracy: 0.9250 - val_loss: 0.1710 - val_accuracy: 0.9430 - 47s/epoch - 144ms/step
+328/328 - 47s - loss: 0.2093 - accuracy: 0.9259 - val_loss: 0.1729 - val_accuracy: 0.9362 - 47s/epoch - 143ms/step
 Epoch 2/3
-328/328 - 44s - loss: 0.0949 - accuracy: 0.9634 - val_loss: 0.1365 - val_accuracy: 0.9517 - 44s/epoch - 133ms/step
+328/328 - 44s - loss: 0.1036 - accuracy: 0.9642 - val_loss: 0.1694 - val_accuracy: 0.9434 - 44s/epoch - 133ms/step
 Epoch 3/3
-328/328 - 44s - loss: 0.0676 - accuracy: 0.9737 - val_loss: 0.1010 - val_accuracy: 0.9679 - 44s/epoch - 133ms/step
+328/328 - 43s - loss: 0.0639 - accuracy: 0.9754 - val_loss: 0.1391 - val_accuracy: 0.9558 - 43s/epoch - 132ms/step
 
-&lt;keras.callbacks.History object at 0x7feb7590e390&gt;
+&lt;keras.callbacks.History object at 0x7f283a6516d0&gt;
 </pre></div>
 </div>
 </div>
@@ -971,7 +971,7 @@ as intended.</p>
 <p>From here, we could modify the model to read live images from the camera - we have another
 Arduino tutorial for how to do that <a class="reference external" href="https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection">on GitHub</a>. Alternatively, we could also
 <a class="reference external" href="https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html">use TVM’s autotuning capabilities</a> to dramatically improve the model’s performance.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 4 minutes  41.884 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 4 minutes  48.368 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-train-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">micro_train.py</span></code></a></p>
diff --git a/docs/how_to/work_with_microtvm/sg_execution_times.html b/docs/how_to/work_with_microtvm/sg_execution_times.html
index a67091fd04..4c9ea2e788 100644
--- a/docs/how_to/work_with_microtvm/sg_execution_times.html
+++ b/docs/how_to/work_with_microtvm/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-microtvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>06:51.106</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
+<p><strong>06:50.430</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -349,23 +349,23 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_train.html#sphx-glr-how-to-work-with-microtvm-micro-train-py"><span class="std std-ref">Training Vision Models for microTVM on Arduino</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_train.py</span></code>)</p></td>
-<td><p>04:41.884</p></td>
+<td><p>04:48.368</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="micro_pytorch.html#sphx-glr-how-to-work-with-microtvm-micro-pytorch-py"><span class="std std-ref">microTVM PyTorch Tutorial</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_pytorch.py</span></code>)</p></td>
-<td><p>01:05.538</p></td>
+<td><p>01:01.655</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></td>
-<td><p>00:51.364</p></td>
+<td><p>00:48.472</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="micro_aot.html#sphx-glr-how-to-work-with-microtvm-micro-aot-py"><span class="std std-ref">microTVM Host-Driven AoT</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_aot.py</span></code>)</p></td>
-<td><p>00:08.446</p></td>
+<td><p>00:08.237</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></td>
-<td><p>00:03.871</p></td>
+<td><p>00:03.695</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="micro_reference_vm.html#sphx-glr-how-to-work-with-microtvm-micro-reference-vm-py"><span class="std std-ref">microTVM Reference Virtual Machines</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_reference_vm.py</span></code>)</p></td>
diff --git a/docs/how_to/work_with_relay/sg_execution_times.html b/docs/how_to/work_with_relay/sg_execution_times.html
index 30df80c571..1aa8c9f3ea 100644
--- a/docs/how_to/work_with_relay/sg_execution_times.html
+++ b/docs/how_to/work_with_relay/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-relay-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:45.425</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
+<p><strong>00:42.718</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -349,15 +349,15 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="using_pipeline_executor.html#sphx-glr-how-to-work-with-relay-using-pipeline-executor-py"><span class="std std-ref">Using Pipeline Executor in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_pipeline_executor.py</span></code>)</p></td>
-<td><p>00:33.298</p></td>
+<td><p>00:31.251</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></td>
-<td><p>00:10.352</p></td>
+<td><p>00:10.121</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></td>
-<td><p>00:01.769</p></td>
+<td><p>00:01.340</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></td>
diff --git a/docs/how_to/work_with_schedules/intrin_math.html b/docs/how_to/work_with_schedules/intrin_math.html
index 746840d9e1..c2b0b985d2 100644
--- a/docs/how_to/work_with_schedules/intrin_math.html
+++ b/docs/how_to/work_with_schedules/intrin_math.html
@@ -535,7 +535,7 @@ The following example customizes CUDA lowering rule for <code class="code docuti
 <a href="../../reference/api/python/ir.html#tvm.ir.register_intrin_lowering" title="tvm.ir.register_intrin_lowering" class="sphx-glr-backref-module-tvm-ir sphx-glr-backref-type-py-function"><span class="n">register_intrin_lowering</span></a><span class="p">(</span><span class="s2">&quot;tir.exp&quot;</span><span class="p">,</span> <span class="n">target</span><span class="o">=</span><span class="s2">&quot;cuda&quot;</span><span class="p">,</span> <span class="n">f</span><span class="o">= [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&lt;function my_cuda_math_rule at 0x7feb725c9c20&gt;
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>&lt;function my_cuda_math_rule at 0x7f283a7c7b90&gt;
 </pre></div>
 </div>
 <p>Register the rule to TVM with override option to override existing rule.
diff --git a/docs/how_to/work_with_schedules/sg_execution_times.html b/docs/how_to/work_with_schedules/sg_execution_times.html
index 600a74f078..25076d08ab 100644
--- a/docs/how_to/work_with_schedules/sg_execution_times.html
+++ b/docs/how_to/work_with_schedules/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-schedules-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:07.493</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
+<p><strong>00:07.140</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -349,35 +349,35 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></td>
-<td><p>00:05.115</p></td>
+<td><p>00:04.899</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></td>
-<td><p>00:01.017</p></td>
+<td><p>00:00.967</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></td>
-<td><p>00:00.580</p></td>
+<td><p>00:00.542</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></td>
-<td><p>00:00.561</p></td>
+<td><p>00:00.523</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="extern_op.html#sphx-glr-how-to-work-with-schedules-extern-op-py"><span class="std std-ref">External Tensor Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">extern_op.py</span></code>)</p></td>
-<td><p>00:00.119</p></td>
+<td><p>00:00.113</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></td>
-<td><p>00:00.052</p></td>
+<td><p>00:00.049</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></td>
-<td><p>00:00.030</p></td>
+<td><p>00:00.029</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></td>
-<td><p>00:00.020</p></td>
+<td><p>00:00.019</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/how_to/work_with_schedules/tensorize.html b/docs/how_to/work_with_schedules/tensorize.html
index c765fe3910..b64ad9592a 100644
--- a/docs/how_to/work_with_schedules/tensorize.html
+++ b/docs/how_to/work_with_schedules/tensorize.html
@@ -590,7 +590,7 @@ The importing needs to happen before the tensorized GEMV being executed.</p>
              C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
   buffer_map = {A_1: A, B_1: B, C_1: C}
   preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmpg0bbiniy/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmpg0bbiniy/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
+  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmpnxfsmwfa/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmpnxfsmwfa/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
   for (i, 0, 1024) {
     for (j.outer: int32, 0, 32) {
       @tir.call_extern(&quot;gemv_update&quot;, @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/reference/api/python/auto_scheduler.html b/docs/reference/api/python/auto_scheduler.html
index 3ba0ad3278..d4730b25c6 100644
--- a/docs/reference/api/python/auto_scheduler.html
+++ b/docs/reference/api/python/auto_scheduler.html
@@ -1615,7 +1615,7 @@ history states as starting point to perform Evolutionary Search).</p></li>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.SketchPolicy">
-<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
+<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
 <dd><p>The search policy that searches in a hierarchical search space defined by sketches.
 The policy randomly samples programs from the space defined by sketches and use evolutionary
 search to fine-tune them.</p>
@@ -1899,7 +1899,7 @@ Candidates:
 
 <dl class="py function">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.auto_schedule">
-<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
+<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
 <dd><p>THIS API IS DEPRECATED.</p>
 <p>Run auto scheduling search for a task.</p>
 <dl class="field-list simple">
diff --git a/docs/reference/api/typedoc/classes/bytestreamreader.html b/docs/reference/api/typedoc/classes/bytestreamreader.html
index 9de44c1727..69a79ed54f 100644
--- a/docs/reference/api/typedoc/classes/bytestreamreader.html
+++ b/docs/reference/api/typedoc/classes/bytestreamreader.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -141,7 +141,7 @@
 					<div class="tsd-signature tsd-kind-icon">bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Uint8Array</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -151,7 +151,7 @@
 					<div class="tsd-signature tsd-kind-icon">offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -168,7 +168,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">Uint8Array</span></h4>
@@ -185,7 +185,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -202,7 +202,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/cachedcallstack.html b/docs/reference/api/typedoc/classes/cachedcallstack.html
index e7a06dee65..980bffa647 100644
--- a/docs/reference/api/typedoc/classes/cachedcallstack.html
+++ b/docs/reference/api/typedoc/classes/cachedcallstack.html
@@ -144,7 +144,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L223">memory.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L223">memory.ts:223</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -172,7 +172,7 @@
 					<div class="tsd-signature tsd-kind-icon">temp<wbr>Args<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><a href="../interfaces/disposable.html" class="tsd-signature-type">Disposable</a><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L208">memory.ts:208</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L208">memory.ts:208</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -194,7 +194,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L312">memory.ts:312</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L312">memory.ts:312</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L284">memory.ts:284</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L284">memory.ts:284</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -262,7 +262,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L388">memory.ts:388</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L388">memory.ts:388</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -300,7 +300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L376">memory.ts:376</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L376">memory.ts:376</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -340,7 +340,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L267">memory.ts:267</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L267">memory.ts:267</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -373,7 +373,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L243">memory.ts:243</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L243">memory.ts:243</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -390,7 +390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L321">memory.ts:321</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L321">memory.ts:321</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -422,7 +422,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L252">memory.ts:252</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L252">memory.ts:252</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -444,7 +444,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L359">memory.ts:359</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L359">memory.ts:359</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -470,7 +470,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L342">memory.ts:342</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L342">memory.ts:342</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -496,7 +496,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L350">memory.ts:350</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L350">memory.ts:350</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -522,7 +522,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L326">memory.ts:326</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L326">memory.ts:326</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -548,7 +548,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L363">memory.ts:363</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L363">memory.ts:363</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -574,7 +574,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L346">memory.ts:346</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L346">memory.ts:346</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -600,7 +600,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L334">memory.ts:334</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L334">memory.ts:334</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
diff --git a/docs/reference/api/typedoc/classes/dldatatype.html b/docs/reference/api/typedoc/classes/dldatatype.html
index a86bbb1ba9..7be8db1858 100644
--- a/docs/reference/api/typedoc/classes/dldatatype.html
+++ b/docs/reference/api/typedoc/classes/dldatatype.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">bits<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L260">runtime.ts:260</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L260">runtime.ts:260</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">code<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L258">runtime.ts:258</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L258">runtime.ts:258</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -177,7 +177,7 @@
 					<div class="tsd-signature tsd-kind-icon">lanes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -199,7 +199,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L279">runtime.ts:279</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L279">runtime.ts:279</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -216,7 +216,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L270">runtime.ts:270</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L270">runtime.ts:270</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/dldevice.html b/docs/reference/api/typedoc/classes/dldevice.html
index 8507a7903d..4a94cd225a 100644
--- a/docs/reference/api/typedoc/classes/dldevice.html
+++ b/docs/reference/api/typedoc/classes/dldevice.html
@@ -118,7 +118,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L202">runtime.ts:202</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L202">runtime.ts:202</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L200">runtime.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L200">runtime.ts:200</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -161,7 +161,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L198">runtime.ts:198</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L198">runtime.ts:198</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -183,7 +183,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L223">runtime.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L223">runtime.ts:223</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -205,7 +205,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L230">runtime.ts:230</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L230">runtime.ts:230</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/environment.html b/docs/reference/api/typedoc/classes/environment.html
index dc07283ca0..fc86eecd88 100644
--- a/docs/reference/api/typedoc/classes/environment.html
+++ b/docs/reference/api/typedoc/classes/environment.html
@@ -125,7 +125,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L86">environment.ts:86</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L86">environment.ts:86</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -169,7 +169,7 @@
 					<aside class="tsd-sources">
 						<p>Implementation of <a href="../interfaces/libraryprovider.html">LibraryProvider</a>.<a href="../interfaces/libraryprovider.html#imports">imports</a></p>
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L70">environment.ts:70</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L70">environment.ts:70</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L69">environment.ts:69</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L69">environment.ts:69</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -210,7 +210,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">ctypes.FTVMWasmPackedCFunc</span><span class="tsd-signature-symbol"> | </span><span class="tsd-signature-type">undefined</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = [undefined,]</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L78">environment.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L78">environment.ts:78</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -228,7 +228,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<wbr>Free<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L84">environment.ts:84</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L84">environment.ts:84</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -250,7 +250,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L105">environment.ts:105</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L105">environment.ts:105</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ffilibrary.html b/docs/reference/api/typedoc/classes/ffilibrary.html
index 7487af7927..220f0130c4 100644
--- a/docs/reference/api/typedoc/classes/ffilibrary.html
+++ b/docs/reference/api/typedoc/classes/ffilibrary.html
@@ -131,7 +131,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L49">runtime.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L49">runtime.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L46">runtime.ts:46</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L46">runtime.ts:46</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L45">runtime.ts:45</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L45">runtime.ts:45</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L44">runtime.ts:44</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L44">runtime.ts:44</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">webGPUContext<span class="tsd-signature-symbol">:</span> <a href="webgpucontext.html" class="tsd-signature-type">WebGPUContext</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L47">runtime.ts:47</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L47">runtime.ts:47</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -203,7 +203,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L76">runtime.ts:76</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L76">runtime.ts:76</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L66">runtime.ts:66</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L66">runtime.ts:66</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -243,7 +243,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L84">runtime.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L84">runtime.ts:84</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <a href="cachedcallstack.html" class="tsd-signature-type">CachedCallStack</a></h4>
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L95">runtime.ts:95</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L95">runtime.ts:95</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -283,7 +283,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L72">runtime.ts:72</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L72">runtime.ts:72</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/graphexecutor.html b/docs/reference/api/typedoc/classes/graphexecutor.html
index 4599f8ff51..5d54c2cd2a 100644
--- a/docs/reference/api/typedoc/classes/graphexecutor.html
+++ b/docs/reference/api/typedoc/classes/graphexecutor.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L583">runtime.ts:583</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L583">runtime.ts:583</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">module<span class="tsd-signature-symbol">:</span> <a href="module.html" class="tsd-signature-type">Module</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L579">runtime.ts:579</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L579">runtime.ts:579</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L654">runtime.ts:654</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L654">runtime.ts:654</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -224,7 +224,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L597">runtime.ts:597</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L597">runtime.ts:597</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -241,7 +241,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L631">runtime.ts:631</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L631">runtime.ts:631</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L644">runtime.ts:644</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L644">runtime.ts:644</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -310,7 +310,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L621">runtime.ts:621</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L621">runtime.ts:621</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -332,7 +332,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L609">runtime.ts:609</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L609">runtime.ts:609</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/instance.html b/docs/reference/api/typedoc/classes/instance.html
index 39af9ed031..f73d90e2c1 100644
--- a/docs/reference/api/typedoc/classes/instance.html
+++ b/docs/reference/api/typedoc/classes/instance.html
@@ -139,7 +139,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L692">runtime.ts:692</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L692">runtime.ts:692</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -202,7 +202,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L684">runtime.ts:684</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L684">runtime.ts:684</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -212,7 +212,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L683">runtime.ts:683</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L683">runtime.ts:683</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -229,7 +229,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L932">runtime.ts:932</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L932">runtime.ts:932</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L994">runtime.ts:994</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L994">runtime.ts:994</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -303,7 +303,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L924">runtime.ts:924</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L924">runtime.ts:924</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -341,7 +341,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L732">runtime.ts:732</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L732">runtime.ts:732</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -358,7 +358,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L952">runtime.ts:952</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L952">runtime.ts:952</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -402,7 +402,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L816">runtime.ts:816</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L816">runtime.ts:816</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -434,7 +434,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -465,7 +465,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L846">runtime.ts:846</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L846">runtime.ts:846</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -497,7 +497,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L750">runtime.ts:750</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L750">runtime.ts:750</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -520,7 +520,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -568,7 +568,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L789">runtime.ts:789</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L789">runtime.ts:789</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -608,7 +608,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L914">runtime.ts:914</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L914">runtime.ts:914</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -646,7 +646,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L1145">runtime.ts:1145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L1145">runtime.ts:1145</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -698,7 +698,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L740">runtime.ts:740</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L740">runtime.ts:740</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -722,7 +722,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L868">runtime.ts:868</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L868">runtime.ts:868</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -754,7 +754,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L857">runtime.ts:857</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L857">runtime.ts:857</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -786,7 +786,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L940">runtime.ts:940</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L940">runtime.ts:940</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/memory.html b/docs/reference/api/typedoc/classes/memory.html
index 7290994cbe..e5b938094b 100644
--- a/docs/reference/api/typedoc/classes/memory.html
+++ b/docs/reference/api/typedoc/classes/memory.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L40">memory.ts:40</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L40">memory.ts:40</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Memory</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L32">memory.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L32">memory.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span><span class="tsd-signature-symbol"> = true</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L33">memory.ts:33</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L33">memory.ts:33</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L154">memory.ts:154</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L154">memory.ts:154</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -210,7 +210,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L90">memory.ts:90</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L90">memory.ts:90</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -233,7 +233,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L97">memory.ts:97</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L97">memory.ts:97</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -256,7 +256,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L74">memory.ts:74</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L74">memory.ts:74</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L81">memory.ts:81</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L81">memory.ts:81</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -302,7 +302,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L104">memory.ts:104</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L104">memory.ts:104</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -325,7 +325,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L132">memory.ts:132</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L132">memory.ts:132</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -362,7 +362,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L145">memory.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L145">memory.ts:145</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -393,7 +393,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L60">memory.ts:60</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L60">memory.ts:60</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -416,7 +416,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L67">memory.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L67">memory.ts:67</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -439,7 +439,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L53">memory.ts:53</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L53">memory.ts:53</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -462,7 +462,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L114">memory.ts:114</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L114">memory.ts:114</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -485,7 +485,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L124">memory.ts:124</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L124">memory.ts:124</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -502,7 +502,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/memory.ts#L175">memory.ts:175</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/memory.ts#L175">memory.ts:175</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/module.html b/docs/reference/api/typedoc/classes/module.html
index c094a06638..f2fcdad6b8 100644
--- a/docs/reference/api/typedoc/classes/module.html
+++ b/docs/reference/api/typedoc/classes/module.html
@@ -124,7 +124,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L504">runtime.ts:504</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L504">runtime.ts:504</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L502">runtime.ts:502</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L502">runtime.ts:502</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -187,7 +187,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L516">runtime.ts:516</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L516">runtime.ts:516</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -204,7 +204,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L530">runtime.ts:530</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L530">runtime.ts:530</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -236,7 +236,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L561">runtime.ts:561</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L561">runtime.ts:561</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ndarray.html b/docs/reference/api/typedoc/classes/ndarray.html
index d01d73e006..7cbebe35eb 100644
--- a/docs/reference/api/typedoc/classes/ndarray.html
+++ b/docs/reference/api/typedoc/classes/ndarray.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L304">runtime.ts:304</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L304">runtime.ts:304</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -158,7 +158,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <a href="dldevice.html" class="tsd-signature-type">DLDevice</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L297">runtime.ts:297</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L297">runtime.ts:297</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -173,7 +173,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L293">runtime.ts:293</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L293">runtime.ts:293</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -188,7 +188,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L289">runtime.ts:289</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L289">runtime.ts:289</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -203,7 +203,7 @@
 					<div class="tsd-signature tsd-kind-icon">ndim<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L291">runtime.ts:291</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L291">runtime.ts:291</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -218,7 +218,7 @@
 					<div class="tsd-signature tsd-kind-icon">shape<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L295">runtime.ts:295</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L295">runtime.ts:295</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -240,7 +240,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L370">runtime.ts:370</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L370">runtime.ts:370</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -273,7 +273,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L414">runtime.ts:414</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L414">runtime.ts:414</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -305,7 +305,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L355">runtime.ts:355</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L355">runtime.ts:355</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -322,7 +322,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L474">runtime.ts:474</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L474">runtime.ts:474</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -346,7 +346,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L443">runtime.ts:443</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L443">runtime.ts:443</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/packedfunccell.html b/docs/reference/api/typedoc/classes/packedfunccell.html
index 2ec3a5c38c..2ab3cd99b0 100644
--- a/docs/reference/api/typedoc/classes/packedfunccell.html
+++ b/docs/reference/api/typedoc/classes/packedfunccell.html
@@ -122,7 +122,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L158">runtime.ts:158</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L158">runtime.ts:158</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L157">runtime.ts:157</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L157">runtime.ts:157</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -164,7 +164,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L165">runtime.ts:165</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L165">runtime.ts:165</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
diff --git a/docs/reference/api/typedoc/classes/rpcserver.html b/docs/reference/api/typedoc/classes/rpcserver.html
index 22d059944a..ee0185a6ee 100644
--- a/docs/reference/api/typedoc/classes/rpcserver.html
+++ b/docs/reference/api/typedoc/classes/rpcserver.html
@@ -115,7 +115,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">get<wbr>Imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">unknown</span><span class="tsd-signat [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -201,7 +201,7 @@
 					<div class="tsd-signature tsd-kind-icon">key<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -211,7 +211,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -242,7 +242,7 @@
 					<div class="tsd-signature tsd-kind-icon">socket<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">WebSocket</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -252,7 +252,7 @@
 					<div class="tsd-signature tsd-kind-icon">state<span class="tsd-signature-symbol">:</span> <a href="../enums/rpcserverstate.html" class="tsd-signature-type">RPCServerState</a><span class="tsd-signature-symbol"> = RPCServerState.InitHeader</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -262,7 +262,7 @@
 					<div class="tsd-signature tsd-kind-icon">url<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/classes/scalar.html b/docs/reference/api/typedoc/classes/scalar.html
index 7af48aef2b..ca5c7dca1f 100644
--- a/docs/reference/api/typedoc/classes/scalar.html
+++ b/docs/reference/api/typedoc/classes/scalar.html
@@ -112,7 +112,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -137,7 +137,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">value<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L143">runtime.ts:143</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L143">runtime.ts:143</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/webgpucontext.html b/docs/reference/api/typedoc/classes/webgpucontext.html
index 5bcad95760..6f2b25beda 100644
--- a/docs/reference/api/typedoc/classes/webgpucontext.html
+++ b/docs/reference/api/typedoc/classes/webgpucontext.html
@@ -120,7 +120,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -145,7 +145,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">GPUDevice</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -155,7 +155,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -172,7 +172,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -209,7 +209,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/enums/argtypecode.html b/docs/reference/api/typedoc/enums/argtypecode.html
index 9a25134615..1afebe20d3 100644
--- a/docs/reference/api/typedoc/enums/argtypecode.html
+++ b/docs/reference/api/typedoc/enums/argtypecode.html
@@ -106,7 +106,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 6</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -116,7 +116,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -126,7 +126,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -136,7 +136,7 @@
 					<div class="tsd-signature tsd-kind-icon">Null<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 12</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMDLTensor<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 7</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMModule<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 9</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMNDArray<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 13</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -196,7 +196,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObject<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -206,7 +206,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObjectRValue<wbr>Ref<wbr>Arg<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 14</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -216,7 +216,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMOpaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -226,7 +226,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMPacked<wbr>Func<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 10</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -236,7 +236,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 11</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -246,7 +246,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/aynccallbackcode.html b/docs/reference/api/typedoc/enums/aynccallbackcode.html
index e398730158..f4a3a6305c 100644
--- a/docs/reference/api/typedoc/enums/aynccallbackcode.html
+++ b/docs/reference/api/typedoc/enums/aynccallbackcode.html
@@ -93,7 +93,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Exception<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L676">runtime.ts:676</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L676">runtime.ts:676</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -103,7 +103,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L675">runtime.ts:675</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L675">runtime.ts:675</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/dldatatypecode.html b/docs/reference/api/typedoc/enums/dldatatypecode.html
index e65940b895..cfd81e7b95 100644
--- a/docs/reference/api/typedoc/enums/dldatatypecode.html
+++ b/docs/reference/api/typedoc/enums/dldatatypecode.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L242">runtime.ts:242</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L242">runtime.ts:242</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L240">runtime.ts:240</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L240">runtime.ts:240</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">Opaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L243">runtime.ts:243</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L243">runtime.ts:243</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -125,7 +125,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L241">runtime.ts:241</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L241">runtime.ts:241</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/rpcserverstate.html b/docs/reference/api/typedoc/enums/rpcserverstate.html
index f2c629295c..74fafea97c 100644
--- a/docs/reference/api/typedoc/enums/rpcserverstate.html
+++ b/docs/reference/api/typedoc/enums/rpcserverstate.html
@@ -90,7 +90,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<wbr>Key<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Server<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Body<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">Wait<wbr>For<wbr>Callback<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/sizeof.html b/docs/reference/api/typedoc/enums/sizeof.html
index ee08cc5ffc..3823fe04b2 100644
--- a/docs/reference/api/typedoc/enums/sizeof.html
+++ b/docs/reference/api/typedoc/enums/sizeof.html
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32 + I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">F32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">F64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">I32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -150,7 +150,7 @@
 					<div class="tsd-signature tsd-kind-icon">I64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -160,7 +160,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMValue<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">U16<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -180,7 +180,7 @@
 					<div class="tsd-signature tsd-kind-icon">U8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/index.html b/docs/reference/api/typedoc/index.html
index 48e6a607bb..f7811b2291 100644
--- a/docs/reference/api/typedoc/index.html
+++ b/docs/reference/api/typedoc/index.html
@@ -174,7 +174,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Alloc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>shape<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, ndim<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeCode<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeBits<span class="tsd [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>Bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">num [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -282,7 +282,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>To<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>from<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, to<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-sig [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -326,7 +326,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>ToBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</sp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -370,7 +370,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -406,7 +406,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMBackend<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number< [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -458,7 +458,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCFunc<wbr>Set<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ret<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -506,7 +506,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCb<wbr>Arg<wbr>ToReturn<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, code<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span c [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -545,7 +545,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Call<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-t [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -601,7 +601,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -637,7 +637,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Get<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span cla [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -676,7 +676,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>List<wbr>Global<wbr>Names<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>outSize<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, outArray<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -715,7 +715,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Register<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, f<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, override<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</spa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -758,7 +758,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMGet<wbr>Last<wbr>Error<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -788,7 +788,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -824,7 +824,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Get<wbr>Function<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, funcName<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, queryImports<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">numbe [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -872,7 +872,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Import<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, dep<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-si [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -912,7 +912,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMSynchronize<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>deviceType<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, deviceId<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signatur [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -954,7 +954,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Alloc<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>size<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -990,7 +990,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Free<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ptr<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1026,7 +1026,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Func<wbr>Create<wbr>FromCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resource<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1066,7 +1066,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>args<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1118,7 +1118,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<wbr>Finalizer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resourceHandle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1154,7 +1154,7 @@
 					<div class="tsd-signature tsd-kind-icon">GPUPointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1169,7 +1169,7 @@
 					<div class="tsd-signature tsd-kind-icon">Packed<wbr>Func<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">...</span>args<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol"> &amp; </span><a href="interfaces/disp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L36">runtime.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L36">runtime.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1184,7 +1184,7 @@
 					<div class="tsd-signature tsd-kind-icon">Pointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1199,7 +1199,7 @@
 					<div class="tsd-signature tsd-kind-icon">Ptr<wbr>Offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1217,7 +1217,7 @@
 					<div class="tsd-signature tsd-kind-icon">RPC_<wbr>MAGIC<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">1045105</span><span class="tsd-signature-symbol"> = 1045105</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1239,7 +1239,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/support.ts#L25">support.ts:25</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/support.ts#L25">support.ts:25</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1271,7 +1271,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/support.ts#L39">support.ts:39</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/support.ts#L39">support.ts:39</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1300,7 +1300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/support.ts#L52">support.ts:52</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/support.ts#L52">support.ts:52</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1337,7 +1337,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/compact.ts#L38">compact.ts:38</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/compact.ts#L38">compact.ts:38</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1368,7 +1368,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1390,7 +1390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/environment.ts#L32">environment.ts:32</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/environment.ts#L32">environment.ts:32</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1421,7 +1421,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/compact.ts#L24">compact.ts:24</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/compact.ts#L24">compact.ts:24</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1443,7 +1443,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L1367">runtime.ts:1367</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L1367">runtime.ts:1367</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1508,7 +1508,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/support.ts#L62">support.ts:62</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/support.ts#L62">support.ts:62</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1530,7 +1530,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<wbr>Code<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L246">runtime.ts:246</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L246">runtime.ts:246</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1539,7 +1539,7 @@
 						<div class="tsd-signature tsd-kind-icon">0<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;int&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L247">runtime.ts:247</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L247">runtime.ts:247</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1549,7 +1549,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;uint&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L248">runtime.ts:248</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L248">runtime.ts:248</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1559,7 +1559,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;float&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L249">runtime.ts:249</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L249">runtime.ts:249</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1569,7 +1569,7 @@
 						<div class="tsd-signature tsd-kind-icon">3<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;handle&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L250">runtime.ts:250</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L250">runtime.ts:250</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1580,7 +1580,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Enum<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L175">runtime.ts:175</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L175">runtime.ts:175</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1589,7 +1589,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L176">runtime.ts:176</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L176">runtime.ts:176</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1599,7 +1599,7 @@
 						<div class="tsd-signature tsd-kind-icon">15<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;webgpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L180">runtime.ts:180</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L180">runtime.ts:180</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1609,7 +1609,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cuda&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L177">runtime.ts:177</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L177">runtime.ts:177</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1619,7 +1619,7 @@
 						<div class="tsd-signature tsd-kind-icon">4<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;opencl&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L178">runtime.ts:178</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L178">runtime.ts:178</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1629,7 +1629,7 @@
 						<div class="tsd-signature tsd-kind-icon">8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;metal&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L179">runtime.ts:179</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L179">runtime.ts:179</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1640,7 +1640,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Str<wbr>ToEnum<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L183">runtime.ts:183</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L183">runtime.ts:183</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1649,7 +1649,7 @@
 						<div class="tsd-signature tsd-kind-icon">cl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L186">runtime.ts:186</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L186">runtime.ts:186</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1659,7 +1659,7 @@
 						<div class="tsd-signature tsd-kind-icon">cpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 1</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L184">runtime.ts:184</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L184">runtime.ts:184</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1669,7 +1669,7 @@
 						<div class="tsd-signature tsd-kind-icon">cuda<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 2</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L185">runtime.ts:185</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L185">runtime.ts:185</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1679,7 +1679,7 @@
 						<div class="tsd-signature tsd-kind-icon">metal<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 8</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L189">runtime.ts:189</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L189">runtime.ts:189</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1689,7 +1689,7 @@
 						<div class="tsd-signature tsd-kind-icon">opencl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L187">runtime.ts:187</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L187">runtime.ts:187</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1699,7 +1699,7 @@
 						<div class="tsd-signature tsd-kind-icon">vulkan<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 7</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L188">runtime.ts:188</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L188">runtime.ts:188</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1709,7 +1709,7 @@
 						<div class="tsd-signature tsd-kind-icon">webgpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 15</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/runtime.ts#L190">runtime.ts:190</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/runtime.ts#L190">runtime.ts:190</a></li>
 							</ul>
 						</aside>
 					</section>
diff --git a/docs/reference/api/typedoc/interfaces/disposable.html b/docs/reference/api/typedoc/interfaces/disposable.html
index f462e7e003..3cf78f8a11 100644
--- a/docs/reference/api/typedoc/interfaces/disposable.html
+++ b/docs/reference/api/typedoc/interfaces/disposable.html
@@ -113,7 +113,7 @@
 					<div class="tsd-signature tsd-kind-icon">dispose<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/types.ts#L52">types.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/types.ts#L52">types.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/interfaces/functioninfo.html b/docs/reference/api/typedoc/interfaces/functioninfo.html
index 8df6d691f7..b75132b427 100644
--- a/docs/reference/api/typedoc/interfaces/functioninfo.html
+++ b/docs/reference/api/typedoc/interfaces/functioninfo.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">arg_<wbr>types<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">launch_<wbr>param_<wbr>tags<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">name<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/interfaces/libraryprovider.html b/docs/reference/api/typedoc/interfaces/libraryprovider.html
index 241da55a0c..50faaa13c3 100644
--- a/docs/reference/api/typedoc/interfaces/libraryprovider.html
+++ b/docs/reference/api/typedoc/interfaces/libraryprovider.html
@@ -112,7 +112,7 @@
 					<div class="tsd-signature tsd-kind-icon">imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/types.ts#L34">types.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/types.ts#L34">types.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -127,7 +127,7 @@
 					<div class="tsd-signature tsd-kind-icon">start<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>inst<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">Instance</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/bac450a64/web/src/types.ts#L39">types.ts:39</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/4f4b4edaf/web/src/types.ts#L39">types.ts:39</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/searchindex.js b/docs/searchindex.js
index 4aa172ada1..292e8422b6 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
+Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
diff --git a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
index 9076e98217..6d8623335d 100644
--- a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:27.288</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
+<p><strong>00:25.907</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 82%" />
@@ -349,7 +349,7 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></td>
-<td><p>00:27.282</p></td>
+<td><p>00:25.900</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></td>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_classification.html b/docs/topic/vta/tutorials/frontend/deploy_classification.html
index 05cf4d5d10..7c5bac596d 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_classification.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_classification.html
@@ -582,7 +582,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
   DeprecationWarning,
 /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
   relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-resnet18_v1 inference graph built in 30.41s!
+resnet18_v1 inference graph built in 28.60s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_detection.html b/docs/topic/vta/tutorials/frontend/deploy_detection.html
index bf75d134c1..cc20359272 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_detection.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_detection.html
@@ -600,7 +600,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/workspace/python/tvm/relay/build_module.py:348: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
   DeprecationWarning,
-yolov3-tiny inference graph built in 20.30s!
+yolov3-tiny inference graph built in 19.25s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/sg_execution_times.html b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
index f2146cc126..886590c08a 100644
--- a/docs/topic/vta/tutorials/frontend/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-frontend-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>01:42.979</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
+<p><strong>01:39.922</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -349,11 +349,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></td>
-<td><p>00:52.646</p></td>
+<td><p>00:51.507</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></td>
-<td><p>00:50.332</p></td>
+<td><p>00:48.415</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/topic/vta/tutorials/optimize/sg_execution_times.html b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
index 19ce707c0f..a84105e6af 100644
--- a/docs/topic/vta/tutorials/optimize/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-optimize-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:03.160</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
+<p><strong>00:03.120</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 84%" />
@@ -349,11 +349,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></td>
-<td><p>00:02.707</p></td>
+<td><p>00:02.691</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></td>
-<td><p>00:00.454</p></td>
+<td><p>00:00.429</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/topic/vta/tutorials/sg_execution_times.html b/docs/topic/vta/tutorials/sg_execution_times.html
index 672cf733ef..3dc4e60f98 100644
--- a/docs/topic/vta/tutorials/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:00.818</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
+<p><strong>00:00.767</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 81%" />
@@ -349,11 +349,11 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></td>
-<td><p>00:00.439</p></td>
+<td><p>00:00.416</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></td>
-<td><p>00:00.379</p></td>
+<td><p>00:00.351</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 </tbody>
diff --git a/docs/tutorial/auto_scheduler_matmul_x86.html b/docs/tutorial/auto_scheduler_matmul_x86.html
index 3c9b47d2ef..c9aa534112 100644
--- a/docs/tutorial/auto_scheduler_matmul_x86.html
+++ b/docs/tutorial/auto_scheduler_matmul_x86.html
@@ -491,6 +491,9 @@ trials, we can load the best schedule from the log file and apply it.</p>
 <a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">sch</span></a><span class="p">,</span> <a href="../reference/api/python/ir.html#tvm.ir.Array" title="tvm.ir.Array" class="sphx-glr-backref-module-tvm-ir sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">args</span></a> <span class="o">=</span> <a href="../reference/api/pyth [...]
 </pre></div>
 </div>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>*E
+</pre></div>
+</div>
 </div>
 <div class="section" id="inspecting-the-optimized-schedule">
 <h2>Inspecting the Optimized Schedule<a class="headerlink" href="#inspecting-the-optimized-schedule" title="Permalink to this headline">¶</a></h2>
@@ -578,7 +581,7 @@ operator fusion.</p>
 <span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 94.425 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 97.652 ms
 </pre></div>
 </div>
 </div>
@@ -642,7 +645,7 @@ resume the status and do more 5 trials.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Resume search:
 /venv/apache-tvm-py3.7/lib/python3.7/site-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html
   warnings.warn(f&#39;Old style callback is deprecated.  See: {link}&#39;, UserWarning)
-.T
+*E
 </pre></div>
 </div>
 </div>
@@ -653,7 +656,7 @@ automatically optimize a matrix multiplication, without the need to specify a
 search template.  It ends a series of examples that starts from the Tensor
 Expression (TE) language that demonstrates how TVM can optimize computational
 operations.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  33.374 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  29.784 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-auto-scheduler-matmul-x86-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/eac4389b114db015e95cb3cdf8b86b83/auto_scheduler_matmul_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">auto_scheduler_matmul_x86.py</span></code></a></p>
diff --git a/docs/tutorial/autotvm_matmul_x86.html b/docs/tutorial/autotvm_matmul_x86.html
index a9f4444ed8..2e58d71c11 100644
--- a/docs/tutorial/autotvm_matmul_x86.html
+++ b/docs/tutorial/autotvm_matmul_x86.html
@@ -679,16 +679,16 @@ reduce variance, we take 5 measurements and average them.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>waiting for device...
 device available
 Get devices for measurement successfully!
-No: 1   GFLOPS: 9.56/9.56       result: MeasureResult(costs=(0.028084570999999996,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6163735389709473, timestamp=1668547810.291327)        [(&#39;tile_y&#39;, [-1, 2]), (&#39;tile_x&#39;, [-1, 64])],None,61
-No: 2   GFLOPS: 12.75/12.75     result: MeasureResult(costs=(0.0210513222,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5504179000854492, timestamp=1668547810.842157)        [(&#39;tile_y&#39;, [-1, 8]), (&#39;tile_x&#39;, [-1, 512])],None,93
-No: 3   GFLOPS: 9.92/12.75      result: MeasureResult(costs=(0.0270647714,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.9314725399017334, timestamp=1668547812.2323968)       [(&#39;tile_y&#39;, [-1, 16]), (&#39;tile_x&#39;, [-1, 128])],None,74
-No: 4   GFLOPS: 1.95/12.75      result: MeasureResult(costs=(0.13787345239999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.365994453430176, timestamp=1668547815.3772676) [(&#39;tile_y&#39;, [-1, 8]), (&#39;tile_x&#39;, [-1, 2])],None,13
-No: 5   GFLOPS: 2.08/12.75      result: MeasureResult(costs=(0.1292589286,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.2243378162384033, timestamp=1668547817.7326822)       [(&#39;tile_y&#39;, [-1, 4]), (&#39;tile_x&#39;, [-1, 4])],None,22
-No: 6   GFLOPS: 2.00/12.75      result: MeasureResult(costs=(0.1344603278,), error_no=MeasureErrorNo.NO_ERROR, all_cost=2.305664539337158, timestamp=1668547820.0587761)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 8])],None,39
-No: 7   GFLOPS: 2.47/12.75      result: MeasureResult(costs=(0.10846081300000002,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.8945081233978271, timestamp=1668547822.7405117)        [(&#39;tile_y&#39;, [-1, 8]), (&#39;tile_x&#39;, [-1, 4])],None,23
-No: 8   GFLOPS: 8.35/12.75      result: MeasureResult(costs=(0.0321628392,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.7031145095825195, timestamp=1668547823.469019)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 32])],None,59
-No: 9   GFLOPS: 6.73/12.75      result: MeasureResult(costs=(0.0398940858,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.7958028316497803, timestamp=1668547824.3780992)       [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 128])],None,79
-No: 10  GFLOPS: 3.64/12.75      result: MeasureResult(costs=(0.07378304599999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.3074233531951904, timestamp=1668547825.7257895)        [(&#39;tile_y&#39;, [-1, 128]), (&#39;tile_x&#39;, [-1, 16])],None,47
+No: 1   GFLOPS: 13.76/13.76     result: MeasureResult(costs=(0.0195130834,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.4759232997894287, timestamp=1668555113.156249)        [(&#39;tile_y&#39;, [-1, 256]), (&#39;tile_x&#39;, [-1, 64])],None,68
+No: 2   GFLOPS: 11.50/13.76     result: MeasureResult(costs=(0.0233377094,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5467529296875, timestamp=1668555114.4190085)  [(&#39;tile_y&#39;, [-1, 128]), (&#39;tile_x&#39;, [-1, 32])],None,57
+No: 3   GFLOPS: 2.42/13.76      result: MeasureResult(costs=(0.11075313839999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.9274091720581055, timestamp=1668555117.0882974)        [(&#39;tile_y&#39;, [-1, 2]), (&#39;tile_x&#39;, [-1, 4])],None,21
+No: 4   GFLOPS: 9.09/13.76      result: MeasureResult(costs=(0.0295406538,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.6850738525390625, timestamp=1668555118.4477098)       [(&#39;tile_y&#39;, [-1, 16]), (&#39;tile_x&#39;, [-1, 32])],None,54
+No: 5   GFLOPS: 4.17/13.76      result: MeasureResult(costs=(0.0644424472,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.2029130458831787, timestamp=1668555119.8010767)       [(&#39;tile_y&#39;, [-1, 16]), (&#39;tile_x&#39;, [-1, 16])],None,44
+No: 6   GFLOPS: 10.46/13.76     result: MeasureResult(costs=(0.025660678599999997,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.615302324295044, timestamp=1668555120.375982) [(&#39;tile_y&#39;, [-1, 8]), (&#39;tile_x&#39;, [-1, 64])],None,63
+No: 7   GFLOPS: 12.69/13.76     result: MeasureResult(costs=(0.021158569,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.5504105091094971, timestamp=1668555120.923032) [(&#39;tile_y&#39;, [-1, 32]), (&#39;tile_x&#39;, [-1, 128])],None,75
+No: 8   GFLOPS: 10.14/13.76     result: MeasureResult(costs=(0.0264689112,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.621466875076294, timestamp=1668555121.5158722)        [(&#39;tile_y&#39;, [-1, 512]), (&#39;tile_x&#39;, [-1, 128])],None,79
+No: 9   GFLOPS: 12.21/13.76     result: MeasureResult(costs=(0.021977817599999998,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.46729397773742676, timestamp=1668555122.4011207)      [(&#39;tile_y&#39;, [-1, 2]), (&#39;tile_x&#39;, [-1, 512])],None,91
+No: 10  GFLOPS: 14.67/14.67     result: MeasureResult(costs=(0.018297431,), error_no=MeasureErrorNo.NO_ERROR, all_cost=0.746845006942749, timestamp=1668555122.9000688) [(&#39;tile_y&#39;, [-1, 64]), (&#39;tile_x&#39;, [-1, 64])],None,66
 </pre></div>
 </div>
 <p>With tuning completed, we can choose the configuration from the log file that
diff --git a/docs/tutorial/autotvm_relay_x86.html b/docs/tutorial/autotvm_relay_x86.html
index 443caf61b6..b0428ba490 100644
--- a/docs/tutorial/autotvm_relay_x86.html
+++ b/docs/tutorial/autotvm_relay_x86.html
@@ -560,7 +560,7 @@ standard deviation.</p>
 <span class="nb">print</span><span class="p">(</span><a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">unoptimized</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 519.9615746099993, &#39;median&#39;: 520.8512474500026, &#39;std&#39;: 2.491058036681814}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 519.9631207500033, &#39;median&#39;: 519.4895130000077, &#39;std&#39;: 3.823469731553201}
 </pre></div>
 </div>
 </div>
@@ -712,179 +712,179 @@ depending on the specifics of the model and the target platform.</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  1/25]  Current/Best:    9.79/  22.52 GFLOPS | Progress: (4/20) | 7.90 s
-[Task  1/25]  Current/Best:   11.15/  22.52 GFLOPS | Progress: (8/20) | 11.13 s
-[Task  1/25]  Current/Best:    9.07/  22.52 GFLOPS | Progress: (12/20) | 14.34 s
-[Task  1/25]  Current/Best:   17.57/  22.52 GFLOPS | Progress: (16/20) | 16.29 s
-[Task  1/25]  Current/Best:   12.88/  22.52 GFLOPS | Progress: (20/20) | 21.99 s Done.
+[Task  1/25]  Current/Best:    6.30/  16.02 GFLOPS | Progress: (4/20) | 8.76 s
+[Task  1/25]  Current/Best:    5.58/  18.29 GFLOPS | Progress: (8/20) | 12.27 s
+[Task  1/25]  Current/Best:   12.76/  18.29 GFLOPS | Progress: (12/20) | 16.61 s
+[Task  1/25]  Current/Best:   11.32/  18.29 GFLOPS | Progress: (16/20) | 19.31 s
+[Task  1/25]  Current/Best:   16.04/  18.29 GFLOPS | Progress: (20/20) | 21.35 s Done.
 
 [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  2/25]  Current/Best:   14.85/  18.91 GFLOPS | Progress: (4/20) | 3.07 s
-[Task  2/25]  Current/Best:   16.42/  18.91 GFLOPS | Progress: (8/20) | 4.24 s
-[Task  2/25]  Current/Best:   11.18/  19.88 GFLOPS | Progress: (12/20) | 6.42 s
-[Task  2/25]  Current/Best:    2.10/  19.88 GFLOPS | Progress: (16/20) | 7.70 s
-[Task  2/25]  Current/Best:    6.23/  19.88 GFLOPS | Progress: (20/20) | 9.01 s Done.
+[Task  2/25]  Current/Best:   12.10/  13.19 GFLOPS | Progress: (4/20) | 3.01 s
+[Task  2/25]  Current/Best:   14.59/  16.67 GFLOPS | Progress: (8/20) | 4.28 s
+[Task  2/25]  Current/Best:   15.14/  16.67 GFLOPS | Progress: (12/20) | 5.64 s
+[Task  2/25]  Current/Best:   20.51/  20.51 GFLOPS | Progress: (16/20) | 6.76 s
+[Task  2/25]  Current/Best:   10.88/  20.51 GFLOPS | Progress: (20/20) | 8.38 s Done.
 
 [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  3/25]  Current/Best:    6.39/  19.47 GFLOPS | Progress: (4/20) | 3.42 s
-[Task  3/25]  Current/Best:   14.31/  19.47 GFLOPS | Progress: (8/20) | 5.62 s
-[Task  3/25]  Current/Best:   15.20/  23.65 GFLOPS | Progress: (12/20) | 7.64 s
-[Task  3/25]  Current/Best:   10.28/  23.65 GFLOPS | Progress: (16/20) | 9.72 s
-[Task  3/25]  Current/Best:   17.80/  23.65 GFLOPS | Progress: (20/20) | 12.05 s Done.
+[Task  3/25]  Current/Best:    8.24/  16.36 GFLOPS | Progress: (4/20) | 3.57 s
+[Task  3/25]  Current/Best:   17.74/  17.74 GFLOPS | Progress: (8/20) | 5.88 s
+[Task  3/25]  Current/Best:   12.29/  18.40 GFLOPS | Progress: (12/20) | 7.82 s
+[Task  3/25]  Current/Best:    9.81/  18.83 GFLOPS | Progress: (16/20) | 9.47 s
+[Task  3/25]  Current/Best:    7.45/  18.83 GFLOPS | Progress: (20/20) | 13.35 s Done.
 
 [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  4/25]  Current/Best:   12.03/  12.03 GFLOPS | Progress: (4/20) | 6.69 s
-[Task  4/25]  Current/Best:   10.61/  20.12 GFLOPS | Progress: (8/20) | 11.00 s
-[Task  4/25]  Current/Best:   11.30/  20.12 GFLOPS | Progress: (12/20) | 13.39 s
-[Task  4/25]  Current/Best:    6.16/  20.12 GFLOPS | Progress: (16/20) | 15.36 s
-[Task  4/25]  Current/Best:    2.94/  22.31 GFLOPS | Progress: (20/20) | 17.44 s Done.
+[Task  4/25]  Current/Best:   10.97/  16.77 GFLOPS | Progress: (4/20) | 3.70 s
+[Task  4/25]  Current/Best:    8.30/  16.77 GFLOPS | Progress: (8/20) | 8.81 s
+[Task  4/25]  Current/Best:   18.89/  20.29 GFLOPS | Progress: (12/20) | 10.24 s
+[Task  4/25]  Current/Best:    9.57/  20.29 GFLOPS | Progress: (16/20) | 13.28 s
+[Task  4/25]  Current/Best:   11.02/  20.29 GFLOPS | Progress: (20/20) | 18.27 s Done.
 
 [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  5/25]  Current/Best:    8.69/  13.85 GFLOPS | Progress: (4/20) | 4.75 s
-[Task  5/25]  Current/Best:   17.55/  19.35 GFLOPS | Progress: (8/20) | 6.40 s
-[Task  5/25]  Current/Best:   14.33/  19.35 GFLOPS | Progress: (12/20) | 8.80 s
-[Task  5/25]  Current/Best:   12.94/  19.35 GFLOPS | Progress: (16/20) | 10.94 s
-[Task  5/25]  Current/Best:   17.42/  19.35 GFLOPS | Progress: (20/20) | 12.75 s Done.
+[Task  5/25]  Current/Best:    6.45/  19.24 GFLOPS | Progress: (4/20) | 3.41 s
+[Task  5/25]  Current/Best:    4.71/  19.24 GFLOPS | Progress: (8/20) | 5.56 s
+[Task  5/25]  Current/Best:    8.27/  19.24 GFLOPS | Progress: (12/20) | 7.43 s
+[Task  5/25]  Current/Best:   17.30/  19.24 GFLOPS | Progress: (16/20) | 8.82 s
+[Task  5/25]  Current/Best:    8.43/  19.24 GFLOPS | Progress: (20/20) | 10.50 s Done.
 
 [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  6/25]  Current/Best:   12.05/  21.19 GFLOPS | Progress: (4/20) | 5.53 s
-[Task  6/25]  Current/Best:   19.68/  21.19 GFLOPS | Progress: (8/20) | 8.14 s
-[Task  6/25]  Current/Best:   17.39/  21.19 GFLOPS | Progress: (12/20) | 9.76 s
-[Task  6/25]  Current/Best:   18.35/  21.19 GFLOPS | Progress: (16/20) | 12.29 s
-[Task  6/25]  Current/Best:   10.49/  21.19 GFLOPS | Progress: (20/20) | 15.77 s Done.
+[Task  6/25]  Current/Best:   11.19/  18.44 GFLOPS | Progress: (4/20) | 4.45 s
+[Task  6/25]  Current/Best:   17.87/  18.44 GFLOPS | Progress: (8/20) | 6.87 s
+[Task  6/25]  Current/Best:   14.84/  18.44 GFLOPS | Progress: (12/20) | 9.45 s
+[Task  6/25]  Current/Best:   12.30/  18.44 GFLOPS | Progress: (16/20) | 11.96 s
+[Task  6/25]  Current/Best:   18.04/  18.44 GFLOPS | Progress: (20/20) | 14.25 s Done.
 
 [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  7/25]  Current/Best:   16.12/  16.12 GFLOPS | Progress: (4/20) | 4.14 s
-[Task  7/25]  Current/Best:   18.22/  18.51 GFLOPS | Progress: (8/20) | 5.79 s
-[Task  7/25]  Current/Best:   12.01/  18.51 GFLOPS | Progress: (12/20) | 8.79 s
-[Task  7/25]  Current/Best:    5.22/  18.51 GFLOPS | Progress: (16/20) | 11.32 s
-[Task  7/25]  Current/Best:   16.70/  21.32 GFLOPS | Progress: (20/20) | 13.00 s Done.
+[Task  7/25]  Current/Best:    8.45/  20.43 GFLOPS | Progress: (4/20) | 3.56 s
+[Task  7/25]  Current/Best:   19.54/  20.43 GFLOPS | Progress: (8/20) | 5.53 s
+[Task  7/25]  Current/Best:    7.12/  20.43 GFLOPS | Progress: (12/20) | 9.18 s
+[Task  7/25]  Current/Best:    4.79/  20.43 GFLOPS | Progress: (16/20) | 11.50 s
+[Task  7/25]  Current/Best:   18.26/  20.43 GFLOPS | Progress: (20/20) | 13.22 s Done.
 
 [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  8/25]  Current/Best:    6.73/  14.93 GFLOPS | Progress: (4/20) | 3.58 s
-[Task  8/25]  Current/Best:   12.08/  14.93 GFLOPS | Progress: (8/20) | 7.23 s
-[Task  8/25]  Current/Best:   14.05/  14.93 GFLOPS | Progress: (12/20) | 9.22 s
-[Task  8/25]  Current/Best:   11.30/  14.93 GFLOPS | Progress: (16/20) | 12.32 s
-[Task  8/25]  Current/Best:    7.09/  14.93 GFLOPS | Progress: (20/20) | 15.40 s Done.
-
+[Task  8/25]  Current/Best:   10.86/  21.23 GFLOPS | Progress: (4/20) | 4.72 s
+[Task  8/25]  Current/Best:   12.20/  21.23 GFLOPS | Progress: (8/20) | 7.58 s
+[Task  8/25]  Current/Best:   16.26/  21.23 GFLOPS | Progress: (12/20) | 12.29 s
+[Task  8/25]  Current/Best:    8.91/  21.23 GFLOPS | Progress: (16/20) | 23.46 s
+[Task  8/25]  Current/Best:   15.06/  21.23 GFLOPS | Progress: (20/20) | 30.96 s
 [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  9/25]  Current/Best:   16.59/  16.59 GFLOPS | Progress: (4/20) | 2.87 s
-[Task  9/25]  Current/Best:    7.68/  16.59 GFLOPS | Progress: (8/20) | 6.11 s
-[Task  9/25]  Current/Best:    5.14/  17.82 GFLOPS | Progress: (12/20) | 11.52 s
-[Task  9/25]  Current/Best:   13.77/  17.82 GFLOPS | Progress: (16/20) | 13.04 s
-[Task  9/25]  Current/Best:   17.43/  17.82 GFLOPS | Progress: (20/20) | 15.78 s Done.
-
+[Task  9/25]  Current/Best:   22.97/  22.97 GFLOPS | Progress: (4/20) | 3.24 s
+[Task  9/25]  Current/Best:    6.35/  22.97 GFLOPS | Progress: (8/20) | 14.21 s
+[Task  9/25]  Current/Best:    9.57/  22.97 GFLOPS | Progress: (12/20) | 16.84 s
+[Task  9/25]  Current/Best:   19.72/  22.97 GFLOPS | Progress: (16/20) | 19.52 s
+[Task  9/25]  Current/Best:   15.49/  22.97 GFLOPS | Progress: (20/20) | 27.82 s
 [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 10/25]  Current/Best:   15.02/  15.02 GFLOPS | Progress: (4/20) | 5.51 s
-[Task 10/25]  Current/Best:   14.34/  15.02 GFLOPS | Progress: (8/20) | 7.37 s
-[Task 10/25]  Current/Best:   16.98/  20.74 GFLOPS | Progress: (12/20) | 9.77 s
-[Task 10/25]  Current/Best:   18.55/  20.74 GFLOPS | Progress: (16/20) | 11.34 s
-[Task 10/25]  Current/Best:   18.82/  20.74 GFLOPS | Progress: (20/20) | 12.74 s Done.
+[Task 10/25]  Current/Best:    1.61/  18.26 GFLOPS | Progress: (4/20) | 3.81 s
+[Task 10/25]  Current/Best:    5.08/  18.26 GFLOPS | Progress: (8/20) | 5.98 s
+[Task 10/25]  Current/Best:   13.31/  18.26 GFLOPS | Progress: (12/20) | 8.21 s
+[Task 10/25]  Current/Best:   13.18/  20.96 GFLOPS | Progress: (16/20) | 9.81 s
+[Task 10/25]  Current/Best:   18.23/  20.96 GFLOPS | Progress: (20/20) | 12.13 s Done.
 
 [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 11/25]  Current/Best:    9.37/  21.18 GFLOPS | Progress: (4/20) | 3.30 s
-[Task 11/25]  Current/Best:   10.46/  21.18 GFLOPS | Progress: (8/20) | 6.26 s
-[Task 11/25]  Current/Best:   16.02/  23.84 GFLOPS | Progress: (12/20) | 8.28 s
-[Task 11/25]  Current/Best:   16.43/  23.84 GFLOPS | Progress: (16/20) | 10.55 s
-[Task 11/25]  Current/Best:    9.43/  23.84 GFLOPS | Progress: (20/20) | 12.86 s Done.
+[Task 11/25]  Current/Best:    8.51/  18.10 GFLOPS | Progress: (4/20) | 3.62 s
+[Task 11/25]  Current/Best:   20.09/  20.09 GFLOPS | Progress: (8/20) | 5.75 s
+[Task 11/25]  Current/Best:    3.13/  21.09 GFLOPS | Progress: (12/20) | 8.15 s
+[Task 11/25]  Current/Best:   10.01/  23.89 GFLOPS | Progress: (16/20) | 10.09 s
+[Task 11/25]  Current/Best:    6.50/  23.89 GFLOPS | Progress: (20/20) | 13.01 s Done.
 
 [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 12/25]  Current/Best:    8.36/  15.79 GFLOPS | Progress: (4/20) | 3.76 s
-[Task 12/25]  Current/Best:   11.30/  15.79 GFLOPS | Progress: (8/20) | 6.79 s
-[Task 12/25]  Current/Best:   11.25/  20.28 GFLOPS | Progress: (12/20) | 9.26 s
-[Task 12/25]  Current/Best:   17.29/  20.28 GFLOPS | Progress: (16/20) | 11.26 s
-[Task 12/25]  Current/Best:    4.49/  20.28 GFLOPS | Progress: (20/20) | 15.05 s Done.
+[Task 12/25]  Current/Best:    9.64/  10.75 GFLOPS | Progress: (4/20) | 6.72 s
+[Task 12/25]  Current/Best:   14.44/  21.04 GFLOPS | Progress: (8/20) | 8.68 s
+[Task 12/25]  Current/Best:   14.24/  21.04 GFLOPS | Progress: (12/20) | 12.11 s
+[Task 12/25]  Current/Best:    7.33/  21.04 GFLOPS | Progress: (16/20) | 14.05 s Done.
+ Done.
+
+[Task 12/25]  Current/Best:    4.40/  21.04 GFLOPS | Progress: (20/20) | 17.28 s Done.
 
 [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 13/25]  Current/Best:   11.72/  21.57 GFLOPS | Progress: (4/20) | 4.77 s
-[Task 13/25]  Current/Best:   11.82/  22.16 GFLOPS | Progress: (8/20) | 6.59 s
-[Task 13/25]  Current/Best:   12.33/  22.16 GFLOPS | Progress: (12/20) | 10.64 s
-[Task 13/25]  Current/Best:   21.08/  22.16 GFLOPS | Progress: (16/20) | 12.71 s
-[Task 13/25]  Current/Best:    9.97/  22.16 GFLOPS | Progress: (20/20) | 15.49 s Done.
+[Task 13/25]  Current/Best:    9.92/  17.36 GFLOPS | Progress: (4/20) | 3.89 s
+[Task 13/25]  Current/Best:   18.24/  18.24 GFLOPS | Progress: (8/20) | 7.12 s
+[Task 13/25]  Current/Best:   18.56/  18.56 GFLOPS | Progress: (12/20) | 9.73 s
+[Task 13/25]  Current/Best:    9.15/  18.58 GFLOPS | Progress: (16/20) | 12.70 s
+[Task 13/25]  Current/Best:    5.75/  18.58 GFLOPS | Progress: (20/20) | 16.49 s Done.
 
 [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 14/25]  Current/Best:   16.69/  16.69 GFLOPS | Progress: (4/20) | 4.66 s
-[Task 14/25]  Current/Best:   10.59/  16.69 GFLOPS | Progress: (8/20) | 7.12 s
-[Task 14/25]  Current/Best:    6.40/  16.69 GFLOPS | Progress: (12/20) | 13.47 s
-[Task 14/25]  Current/Best:    4.00/  19.18 GFLOPS | Progress: (16/20) | 15.81 s
-[Task 14/25]  Current/Best:   13.48/  21.64 GFLOPS | Progress: (20/20) | 18.76 s
+[Task 14/25]  Current/Best:   16.18/  17.99 GFLOPS | Progress: (4/20) | 3.63 s
+[Task 14/25]  Current/Best:    3.73/  17.99 GFLOPS | Progress: (8/20) | 5.71 s
+[Task 14/25]  Current/Best:   17.32/  17.99 GFLOPS | Progress: (12/20) | 8.23 s
+[Task 14/25]  Current/Best:    3.02/  17.99 GFLOPS | Progress: (16/20) | 11.39 s
+[Task 14/25]  Current/Best:    8.99/  21.47 GFLOPS | Progress: (20/20) | 13.05 s
 [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 15/25]  Current/Best:   18.94/  18.94 GFLOPS | Progress: (4/20) | 3.35 s
-[Task 15/25]  Current/Best:   16.01/  18.94 GFLOPS | Progress: (8/20) | 5.58 s
-[Task 15/25]  Current/Best:    9.18/  18.94 GFLOPS | Progress: (12/20) | 9.13 s Done.
+[Task 15/25]  Current/Best:   10.33/  10.33 GFLOPS | Progress: (4/20) | 7.04 s
+[Task 15/25]  Current/Best:   16.52/  17.57 GFLOPS | Progress: (8/20) | 9.75 s
+[Task 15/25]  Current/Best:   17.24/  18.01 GFLOPS | Progress: (12/20) | 11.01 s
+[Task 15/25]  Current/Best:    8.38/  18.01 GFLOPS | Progress: (16/20) | 12.41 s
+[Task 15/25]  Current/Best:   20.18/  20.18 GFLOPS | Progress: (20/20) | 15.15 s
+[Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
+[Task 16/25]  Current/Best:   14.08/  16.10 GFLOPS | Progress: (4/20) | 3.52 s
+[Task 16/25]  Current/Best:    7.51/  18.33 GFLOPS | Progress: (8/20) | 6.46 s
+[Task 16/25]  Current/Best:   13.62/  18.33 GFLOPS | Progress: (12/20) | 7.86 s
+[Task 16/25]  Current/Best:   11.88/  18.33 GFLOPS | Progress: (16/20) | 9.68 s
+[Task 16/25]  Current/Best:   19.83/  19.83 GFLOPS | Progress: (20/20) | 10.95 s Done.
 
-[Task 15/25]  Current/Best:   21.86/  21.86 GFLOPS | Progress: (16/20) | 11.11 s
-[Task 15/25]  Current/Best:   15.42/  21.86 GFLOPS | Progress: (20/20) | 12.60 s Done.
+[Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
+ Done.
 
-[Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 16/25]  Current/Best:   10.56/  18.46 GFLOPS | Progress: (4/20) | 4.23 s
-[Task 16/25]  Current/Best:    3.07/  18.46 GFLOPS | Progress: (8/20) | 6.09 s
-[Task 16/25]  Current/Best:   18.66/  18.66 GFLOPS | Progress: (12/20) | 8.30 s
-[Task 16/25]  Current/Best:    3.03/  20.00 GFLOPS | Progress: (16/20) | 11.28 s
-[Task 16/25]  Current/Best:   10.59/  20.00 GFLOPS | Progress: (20/20) | 14.23 s Done.
-
-[Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 17/25]  Current/Best:   10.50/  17.21 GFLOPS | Progress: (4/20) | 4.00 s
-[Task 17/25]  Current/Best:   11.94/  17.31 GFLOPS | Progress: (8/20) | 6.20 s
-[Task 17/25]  Current/Best:   10.69/  18.69 GFLOPS | Progress: (12/20) | 8.66 s
-[Task 17/25]  Current/Best:   12.14/  18.69 GFLOPS | Progress: (16/20) | 14.50 s
-[Task 17/25]  Current/Best:   11.74/  21.32 GFLOPS | Progress: (20/20) | 16.58 s Done.
+[Task 17/25]  Current/Best:   23.90/  23.90 GFLOPS | Progress: (4/20) | 3.59 s
+[Task 17/25]  Current/Best:    3.06/  23.90 GFLOPS | Progress: (8/20) | 6.48 s
+[Task 17/25]  Current/Best:   18.87/  23.90 GFLOPS | Progress: (12/20) | 9.29 s
+[Task 17/25]  Current/Best:   11.24/  23.90 GFLOPS | Progress: (16/20) | 12.20 s
+[Task 17/25]  Current/Best:   13.64/  23.90 GFLOPS | Progress: (20/20) | 15.10 s Done.
 
 [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 18/25]  Current/Best:    2.99/  16.91 GFLOPS | Progress: (4/20) | 4.08 s
-[Task 18/25]  Current/Best:   11.11/  19.32 GFLOPS | Progress: (8/20) | 7.06 s
-[Task 18/25]  Current/Best:   17.72/  19.32 GFLOPS | Progress: (12/20) | 9.28 s
-[Task 18/25]  Current/Best:    6.24/  19.32 GFLOPS | Progress: (16/20) | 12.39 s
-[Task 18/25]  Current/Best:    6.16/  19.32 GFLOPS | Progress: (20/20) | 15.26 s Done.
+[Task 18/25]  Current/Best:    9.20/  19.70 GFLOPS | Progress: (4/20) | 3.29 s
+[Task 18/25]  Current/Best:   15.73/  22.18 GFLOPS | Progress: (8/20) | 4.67 s
+[Task 18/25]  Current/Best:   18.75/  22.18 GFLOPS | Progress: (12/20) | 6.22 s
+[Task 18/25]  Current/Best:   13.73/  22.18 GFLOPS | Progress: (16/20) | 12.25 s
+[Task 18/25]  Current/Best:   12.95/  22.18 GFLOPS | Progress: (20/20) | 14.55 s Done.
 
 [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 19/25]  Current/Best:   19.86/  19.86 GFLOPS | Progress: (4/20) | 4.57 s
-[Task 19/25]  Current/Best:   12.24/  19.86 GFLOPS | Progress: (8/20) | 8.43 s
-[Task 19/25]  Current/Best:   20.60/  20.60 GFLOPS | Progress: (12/20) | 11.37 s
-[Task 19/25]  Current/Best:   18.12/  20.60 GFLOPS | Progress: (16/20) | 14.31 s
-[Task 19/25]  Current/Best:   14.09/  20.60 GFLOPS | Progress: (20/20) | 17.88 s Done.
+[Task 19/25]  Current/Best:   11.73/  21.49 GFLOPS | Progress: (4/20) | 5.70 s
+[Task 19/25]  Current/Best:    5.38/  22.19 GFLOPS | Progress: (8/20) | 9.07 s
+[Task 19/25]  Current/Best:   11.66/  22.19 GFLOPS | Progress: (12/20) | 11.82 s
+[Task 19/25]  Current/Best:   11.29/  22.19 GFLOPS | Progress: (16/20) | 14.52 s
+[Task 19/25]  Current/Best:   22.44/  22.44 GFLOPS | Progress: (20/20) | 16.45 s Done.
 
 [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 20/25]  Current/Best:   20.99/  20.99 GFLOPS | Progress: (4/20) | 4.44 s
-[Task 20/25]  Current/Best:   13.83/  20.99 GFLOPS | Progress: (8/20) | 6.18 s
-[Task 20/25]  Current/Best:   16.74/  20.99 GFLOPS | Progress: (12/20) | 8.31 s
-[Task 20/25]  Current/Best:   16.87/  20.99 GFLOPS | Progress: (16/20) | 10.89 s
-[Task 20/25]  Current/Best:   13.87/  20.99 GFLOPS | Progress: (20/20) | 16.77 s
+[Task 20/25]  Current/Best:   18.37/  18.37 GFLOPS | Progress: (4/20) | 3.69 s
+[Task 20/25]  Current/Best:   14.28/  21.65 GFLOPS | Progress: (8/20) | 6.74 s
+[Task 20/25]  Current/Best:   10.12/  21.65 GFLOPS | Progress: (12/20) | 8.44 s
+[Task 20/25]  Current/Best:   10.23/  21.65 GFLOPS | Progress: (16/20) | 10.63 s
+[Task 20/25]  Current/Best:   10.34/  21.65 GFLOPS | Progress: (20/20) | 13.57 s
 [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 21/25]  Current/Best:   10.46/  13.00 GFLOPS | Progress: (4/20) | 3.46 s
-[Task 21/25]  Current/Best:   17.41/  17.41 GFLOPS | Progress: (8/20) | 5.99 s Done.
+[Task 21/25]  Current/Best:   17.50/  17.50 GFLOPS | Progress: (4/20) | 3.64 s
+[Task 21/25]  Current/Best:    6.70/  17.50 GFLOPS | Progress: (8/20) | 6.78 s
+[Task 21/25]  Current/Best:    2.70/  17.50 GFLOPS | Progress: (12/20) | 11.39 s Done.
+
+[Task 21/25]  Current/Best:   14.55/  18.53 GFLOPS | Progress: (16/20) | 13.26 s
+[Task 21/25]  Current/Best:   21.30/  21.30 GFLOPS | Progress: (20/20) | 14.50 s Done.
 
-[Task 21/25]  Current/Best:   17.35/  17.41 GFLOPS | Progress: (12/20) | 7.85 s
-[Task 21/25]  Current/Best:    8.34/  21.60 GFLOPS | Progress: (16/20) | 10.65 s
-[Task 21/25]  Current/Best:   10.56/  21.60 GFLOPS | Progress: (20/20) | 12.41 s
 [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 22/25]  Current/Best:   13.18/  13.18 GFLOPS | Progress: (4/20) | 4.36 s
-[Task 22/25]  Current/Best:   17.35/  17.35 GFLOPS | Progress: (8/20) | 5.92 s
-[Task 22/25]  Current/Best:    8.05/  17.35 GFLOPS | Progress: (12/20) | 8.86 s
-[Task 22/25]  Current/Best:    5.21/  17.35 GFLOPS | Progress: (16/20) | 10.92 s
-[Task 22/25]  Current/Best:    5.28/  20.03 GFLOPS | Progress: (20/20) | 12.46 s Done.
+[Task 22/25]  Current/Best:   12.14/  17.90 GFLOPS | Progress: (4/20) | 3.35 s
+[Task 22/25]  Current/Best:    9.73/  22.40 GFLOPS | Progress: (8/20) | 5.40 s
+[Task 22/25]  Current/Best:   17.49/  22.40 GFLOPS | Progress: (12/20) | 7.37 s
+[Task 22/25]  Current/Best:    8.50/  22.40 GFLOPS | Progress: (16/20) | 9.17 s
+[Task 22/25]  Current/Best:    7.64/  22.40 GFLOPS | Progress: (20/20) | 10.78 s Done.
 
 [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 23/25]  Current/Best:   12.01/  15.04 GFLOPS | Progress: (4/20) | 5.53 s
-[Task 23/25]  Current/Best:   18.57/  20.58 GFLOPS | Progress: (8/20) | 7.88 s
-[Task 23/25]  Current/Best:   12.00/  20.58 GFLOPS | Progress: (12/20) | 12.52 s
-[Task 23/25]  Current/Best:    1.55/  20.58 GFLOPS | Progress: (16/20) | 17.71 s
-[Task 23/25]  Current/Best:   10.10/  20.58 GFLOPS | Progress: (20/20) | 21.59 s Done.
+[Task 23/25]  Current/Best:   11.90/  20.14 GFLOPS | Progress: (4/20) | 3.74 s
+[Task 23/25]  Current/Best:   10.70/  20.14 GFLOPS | Progress: (8/20) | 7.00 s
+[Task 23/25]  Current/Best:    9.68/  20.14 GFLOPS | Progress: (12/20) | 9.71 s
+[Task 23/25]  Current/Best:   19.18/  22.44 GFLOPS | Progress: (16/20) | 12.16 s
+[Task 23/25]  Current/Best:   20.12/  22.44 GFLOPS | Progress: (20/20) | 15.14 s Done.
 
 [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 24/25]  Current/Best:    3.78/   3.78 GFLOPS | Progress: (4/20) | 12.11 s
-[Task 24/25]  Current/Best:    1.66/   3.78 GFLOPS | Progress: (8/20) | 22.65 s
-[Task 24/25]  Current/Best:    1.47/  10.01 GFLOPS | Progress: (12/20) | 30.11 s
-[Task 24/25]  Current/Best:    3.14/  10.01 GFLOPS | Progress: (16/20) | 40.80 s
-[Task 24/25]  Current/Best:    3.44/  10.01 GFLOPS | Progress: (20/20) | 52.52 s
-[Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s Done.
- Done.
-
-[Task 25/25]  Current/Best:    3.49/   7.44 GFLOPS | Progress: (4/20) | 2.96 s
-[Task 25/25]  Current/Best:    8.99/   8.99 GFLOPS | Progress: (8/20) | 13.66 s
-[Task 25/25]  Current/Best:    6.05/   8.99 GFLOPS | Progress: (12/20) | 25.32 s
-[Task 25/25]  Current/Best:    9.04/   9.04 GFLOPS | Progress: (16/20) | 30.87 s
-[Task 25/25]  Current/Best:    4.74/   9.05 GFLOPS | Progress: (20/20) | 32.12 s
+[Task 24/25]  Current/Best:    5.42/  10.21 GFLOPS | Progress: (4/20) | 4.24 s
+[Task 24/25]  Current/Best:    4.85/  10.21 GFLOPS | Progress: (8/20) | 10.23 s
+[Task 24/25]  Current/Best:   10.05/  10.21 GFLOPS | Progress: (12/20) | 17.83 s
+[Task 24/25]  Current/Best:    3.91/  10.21 GFLOPS | Progress: (16/20) | 28.54 s
+[Task 24/25]  Current/Best:    3.39/  10.21 GFLOPS | Progress: (20/20) | 39.26 s
+[Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
+[Task 25/25]  Current/Best:    5.69/   7.08 GFLOPS | Progress: (4/20) | 12.95 s Done.
+
+[Task 25/25]  Current/Best:    5.72/   8.59 GFLOPS | Progress: (8/20) | 14.69 s
+[Task 25/25]  Current/Best:    1.55/   8.59 GFLOPS | Progress: (12/20) | 19.83 s
+[Task 25/25]  Current/Best:    5.77/   8.75 GFLOPS | Progress: (16/20) | 24.88 s
+[Task 25/25]  Current/Best:    3.91/   8.75 GFLOPS | Progress: (20/20) | 30.22 s Done.
 </pre></div>
 </div>
 <p>The output from this tuning process will look something like this:</p>
@@ -945,8 +945,8 @@ model using optimized operators to speed up our computations.</p>
     <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;class=&#39;</span><span class="si">%s</span><span class="s2">&#39; with probability=</span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><a href="https://docs.python.org/3/library/stdtypes.html#list" title="builtins.list" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">labels</span></a [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>class=&#39;n02123045 tabby, tabby cat&#39; with probability=0.621102
-class=&#39;n02123159 tiger cat&#39; with probability=0.356379
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>class=&#39;n02123045 tabby, tabby cat&#39; with probability=0.621104
+class=&#39;n02123159 tiger cat&#39; with probability=0.356378
 class=&#39;n02124075 Egyptian cat&#39; with probability=0.019712
 class=&#39;n02129604 tiger, Panthera tigris&#39; with probability=0.001215
 class=&#39;n04040759 radiator&#39; with probability=0.000262
@@ -983,8 +983,8 @@ improvement in comparing the optimized model to the unoptimized model.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;unoptimized: </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><a href="https://docs.python.org/3/library/stdtypes.html#dict" title="builtins.dict" class="sphx-glr-backref-module-builtins sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">unoptimized</span></a><span class="p">))</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 417.65824050000447, &#39;median&#39;: 418.0077908000044, &#39;std&#39;: 4.044211110186569}
-unoptimized: {&#39;mean&#39;: 519.9615746099993, &#39;median&#39;: 520.8512474500026, &#39;std&#39;: 2.491058036681814}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 410.64341234000494, &#39;median&#39;: 411.674945499999, &#39;std&#39;: 2.992942377835809}
+unoptimized: {&#39;mean&#39;: 519.9631207500033, &#39;median&#39;: 519.4895130000077, &#39;std&#39;: 3.823469731553201}
 </pre></div>
 </div>
 </div>
@@ -998,7 +998,7 @@ models.</p>
 <p>Here we presented a simple example using ResNet-50 v2 locally. However, TVM
 supports many more features including cross-compilation, remote execution and
 profiling/benchmarking.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  45.363 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  32.397 seconds)</p>
 <div class="sphx-glr-footer sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-autotvm-relay-x86-py">
 <div class="sphx-glr-download sphx-glr-download-python docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/57a45d9bef1af358191e7d50043e652c/autotvm_relay_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">autotvm_relay_x86.py</span></code></a></p>
diff --git a/docs/tutorial/cross_compilation_and_rpc.html b/docs/tutorial/cross_compilation_and_rpc.html
index 98b3f6d7f0..cb2c6f4d1a 100644
--- a/docs/tutorial/cross_compilation_and_rpc.html
+++ b/docs/tutorial/cross_compilation_and_rpc.html
@@ -537,7 +537,7 @@ device and returns the measured cost. Network overhead is excluded.</p>
 <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;</span><span class="si">%g</span><span class="s2"> secs/op&quot;</span> <span class="o">%</span> <span class="n">cost</span><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.261e-07 secs/op
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.262e-07 secs/op
 </pre></div>
 </div>
 </div>
diff --git a/docs/tutorial/intro_topi.html b/docs/tutorial/intro_topi.html
index 9606a851ba..4f63bf19f0 100644
--- a/docs/tutorial/intro_topi.html
+++ b/docs/tutorial/intro_topi.html
@@ -497,7 +497,7 @@ we can schedule the following series of operations ending with <code class="code
 <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/ir.html#tvm.ir.Array" title="tvm.ir.Array" class="sphx-glr-backref-module-tvm-ir sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">sg</span><span class="o">.</span><span class="n">stages</span></a><span class="p">)</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0xa50fa70)), stage(b, placeholder(b, 0x21fc3170)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[i [...]
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0x3751670)), stage(b, placeholder(b, 0x18707210)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[i [...]
 </pre></div>
 </div>
 <p>We can test the correctness by comparing with <code class="code docutils literal notranslate"><span class="pre">numpy</span></code> result as follows</p>
diff --git a/docs/tutorial/sg_execution_times.html b/docs/tutorial/sg_execution_times.html
index 6b18b3fd0a..0ecff1afc0 100644
--- a/docs/tutorial/sg_execution_times.html
+++ b/docs/tutorial/sg_execution_times.html
@@ -340,7 +340,7 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-tutorial-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>14:18.064</strong> total execution time for <strong>tutorial</strong> files:</p>
+<p><strong>13:53.607</strong> total execution time for <strong>tutorial</strong> files:</p>
 <table class="docutils align-default">
 <colgroup>
 <col style="width: 83%" />
@@ -349,35 +349,35 @@
 </colgroup>
 <tbody>
 <tr class="row-odd"><td><p><a class="reference internal" href="autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py"><span class="std std-ref">Compiling and Optimizing a Model with the Python Interface (AutoTVM)</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_relay_x86.py</span></code>)</p></td>
-<td><p>10:45.363</p></td>
+<td><p>10:32.397</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="auto_scheduler_matmul_x86.html#sphx-glr-tutorial-auto-scheduler-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Auto-scheduling</span></a> (<code class="docutils literal notranslate"><span class="pre">auto_scheduler_matmul_x86.py</span></code>)</p></td>
-<td><p>01:33.374</p></td>
+<td><p>01:29.784</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="tensor_expr_get_started.html#sphx-glr-tutorial-tensor-expr-get-started-py"><span class="std std-ref">Working with Operators Using Tensor Expression</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_expr_get_started.py</span></code>)</p></td>
-<td><p>00:59.992</p></td>
+<td><p>00:59.972</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="relay_quick_start.html#sphx-glr-tutorial-relay-quick-start-py"><span class="std std-ref">Quick Start Tutorial for Compiling Deep Learning Models</span></a> (<code class="docutils literal notranslate"><span class="pre">relay_quick_start.py</span></code>)</p></td>
-<td><p>00:36.726</p></td>
+<td><p>00:35.579</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="autotvm_matmul_x86.html#sphx-glr-tutorial-autotvm-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Schedule Templates and AutoTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_matmul_x86.py</span></code>)</p></td>
-<td><p>00:20.427</p></td>
+<td><p>00:14.334</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-even"><td><p><a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></td>
-<td><p>00:01.221</p></td>
+<tr class="row-even"><td><p><a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></td>
+<td><p>00:00.758</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></td>
-<td><p>00:00.776</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></td>
+<td><p>00:00.628</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-even"><td><p><a class="reference internal" href="cross_compilation_and_rpc.html#sphx-glr-tutorial-cross-compilation-and-rpc-py"><span class="std std-ref">Cross Compilation and RPC</span></a> (<code class="docutils literal notranslate"><span class="pre">cross_compilation_and_rpc.py</span></code>)</p></td>
-<td><p>00:00.177</p></td>
+<td><p>00:00.148</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
 <tr class="row-odd"><td><p><a class="reference internal" href="introduction.html#sphx-glr-tutorial-introduction-py"><span class="std std-ref">Introduction</span></a> (<code class="docutils literal notranslate"><span class="pre">introduction.py</span></code>)</p></td>
@@ -388,7 +388,7 @@
 <td><p>00:00.001</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></td>
 <td><p>00:00.001</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
@@ -396,7 +396,7 @@
 <td><p>00:00.001</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
-<tr class="row-odd"><td><p><a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></td>
+<tr class="row-odd"><td><p><a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></td>
 <td><p>00:00.001</p></td>
 <td><p>0.0 MB</p></td>
 </tr>
diff --git a/docs/tutorial/tensor_expr_get_started.html b/docs/tutorial/tensor_expr_get_started.html
index 57c81db0e9..7a09037852 100644
--- a/docs/tutorial/tensor_expr_get_started.html
+++ b/docs/tutorial/tensor_expr_get_started.html
@@ -551,8 +551,8 @@ helper function to run a profile of the TVM generated code.</p>
 <span class="n">evaluate_addition</span><span class="p">(</span><span class="n">fadd</span><span class="p">,</span> <a href="../reference/api/python/target.html#tvm.target.Target" title="tvm.target.Target" class="sphx-glr-backref-module-tvm-target sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">tgt</span></a><span class="p">,</span> <span class="s2">&quot;naive&quot;</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.html#list" ti [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.000012
-naive: 0.000006
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.000008
+naive: 0.000007
 </pre></div>
 </div>
 </div>
@@ -601,7 +601,7 @@ compile and run this new schedule with the parallel operation applied:</p>
 <span class="n">evaluate_addition</span><span class="p">(</span><span class="n">fadd_parallel</span><span class="p">,</span> <a href="../reference/api/python/target.html#tvm.target.Target" title="tvm.target.Target" class="sphx-glr-backref-module-tvm-target sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">tgt</span></a><span class="p">,</span> <span class="s2">&quot;parallel&quot;</span><span class="p">,</span> <a href="https://docs.python.org/3/library/stdtypes.h [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000010
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000007
 </pre></div>
 </div>
 </div>
@@ -673,10 +673,10 @@ factor to be the number of threads on your CPU.</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Operator                  Timing             Performance
-   numpy    1.1799070000506617e-05                   1.0
-   naive    5.833399999999999e-06    0.49439489720372287
-parallel    1.0206899999999999e-05    0.8650597038208727
-  vector             2.45952e-05       2.084503270083486
+   numpy    7.690120000916067e-06                    1.0
+   naive              6.7037e-06      0.8717289196009215
+parallel    6.941300000000001e-06      0.902625706643477
+  vector    2.4571799999999997e-05     3.195242726650942
 </pre></div>
 </div>
 <div class="admonition-code-specialization admonition">
@@ -992,7 +992,7 @@ matrix multiplication.</p>
 <span class="n">answer</span> <span class="o">=</span> <span class="n">numpy</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">numpy</span><span class="p">(),</span> <span class="n">b</span><span class="o">.</span><span class="n">numpy</span><span class="p">())</span>
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018890
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018246
 </pre></div>
 </div>
 <p>Now we write a basic matrix multiplication using TVM TE and verify that it
@@ -1033,7 +1033,7 @@ optimizations.</p>
 <span class="n">evaluate_operation</span><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">s</span></a><span class="p">,</span> <span class="p">[</span><a href="../reference/api/python/te.html#tvm.te.Tensor" title="tvm.te.Tensor" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>none: 3.273652
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>none: 3.275580
 </pre></div>
 </div>
 <p>Let’s take a look at the intermediate representation of the operator and
@@ -1098,7 +1098,7 @@ schedule.</p>
 <span class="n">evaluate_operation</span><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class="n">s</span></a><span class="p">,</span> <span class="p">[</span><a href="../reference/api/python/te.html#tvm.te.Tensor" title="tvm.te.Tensor" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.326503
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.305448
 </pre></div>
 </div>
 <p>By reordering the computation to take advantage of caching, you should see a
@@ -1157,7 +1157,7 @@ already cache friendly from our previous optimizations.</p>
 <span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/driver.html#tvm.lower" title="tvm.lower" class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span></a><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.352691
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.340588
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1212,7 +1212,7 @@ more cache friendly.</p>
 <span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/driver.html#tvm.lower" title="tvm.lower" class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span></a><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.126409
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.112845
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1288,7 +1288,7 @@ optimized schedule.</p>
 <span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/driver.html#tvm.lower" title="tvm.lower" class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span></a><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.110843
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.107526
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1362,7 +1362,7 @@ to `C</cite> when all the block results are ready.</p>
 <span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/driver.html#tvm.lower" title="tvm.lower" class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span></a><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.112136
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.111202
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1429,7 +1429,7 @@ of thread-level parallelization.</p>
 <span class="nb">print</span><span class="p">(</span><a href="../reference/api/python/driver.html#tvm.lower" title="tvm.lower" class="sphx-glr-backref-module-tvm sphx-glr-backref-type-py-function"><span class="n">tvm</span><span class="o">.</span><span class="n">lower</span></a><span class="p">(</span><a href="../reference/api/python/te.html#tvm.te.Schedule" title="tvm.te.Schedule" class="sphx-glr-backref-module-tvm-te sphx-glr-backref-type-py-class sphx-glr-backref-instance"><span class [...]
 </pre></div>
 </div>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.147541
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.146423
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1491,13 +1491,13 @@ working, we can compare the results.</p>
 </pre></div>
 </div>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>        Operator                  Timing             Performance
-            none      3.2736517458999996                     1.0
-        blocking            0.3265026695     0.09973653120217195
-   vectorization            0.3526911035      0.1077362929461629
-loop permutation     0.12640937730000001    0.038614179855361266
-   array packing     0.11084349060000001    0.033859279851261845
-   block caching     0.11213572229999999     0.03425401692175763
- parallelization     0.14754136820000002     0.04506935363078385
+            none      3.2755796965000004                     1.0
+        blocking     0.30544793070000004     0.09325003785631446
+   vectorization             0.340588204     0.10397799338050695
+loop permutation     0.11284525279999999     0.03445046778149731
+   array packing     0.10752643740000001    0.032826689429933095
+   block caching             0.111202406     0.03394892394736151
+ parallelization            0.1464231732     0.04470145341188159
 </pre></div>
 </div>
 <p>Note that the outputs on the web page reflect the running times on a