You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by tq...@apache.org on 2022/06/03 23:58:13 UTC
[tvm-site] branch asf-site updated: deploying docs (apache/tvm@f05ebde8e84e4bce620b0fdf839b89eb60c1008c)

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b5ea3d902 deploying docs (apache/tvm@f05ebde8e84e4bce620b0fdf839b89eb60c1008c)
b5ea3d902 is described below

commit b5ea3d90272d3aaf76ca310fd66aaf62a1e1e01d
Author: tvm-bot <95...@users.noreply.github.com>
AuthorDate: Fri Jun 3 23:58:08 2022 +0000

    deploying docs (apache/tvm@f05ebde8e84e4bce620b0fdf839b89eb60c1008c)
---
 .../micro_train.ipynb                              |  309 ++++++
 .../micro_train.py                                 |  649 ++++++++++++
 docs/_images/sphx_glr_micro_train_001.png          |  Bin 0 -> 309529 bytes
 docs/_images/sphx_glr_micro_train_thumb.png        |  Bin 0 -> 22866 bytes
 .../how_to/compile_models/from_mxnet.rst.txt       |    2 +-
 .../how_to/compile_models/from_oneflow.rst.txt     |    2 +-
 .../how_to/compile_models/from_paddle.rst.txt      |    2 +-
 .../how_to/compile_models/from_pytorch.rst.txt     |    2 +-
 .../how_to/compile_models/from_tensorflow.rst.txt  |    5 +
 .../compile_models/sg_execution_times.rst.txt      |   22 +-
 .../deploy_models/deploy_model_on_android.rst.txt  |    2 +-
 .../deploy_object_detection_pytorch.rst.txt        |    4 +-
 .../deploy_models/deploy_prequantized.rst.txt      |    6 +-
 .../deploy_prequantized_tflite.rst.txt             |    4 +-
 .../how_to/deploy_models/deploy_quantized.rst.txt  |    2 +-
 .../deploy_models/deploy_ssd_gluoncv.rst.txt       |    4 +-
 .../deploy_models/sg_execution_times.rst.txt       |   18 +-
 .../extend_tvm/bring_your_own_datatypes.rst.txt    |    2 +-
 .../how_to/extend_tvm/sg_execution_times.rst.txt   |   10 +-
 .../how_to/extend_tvm/use_pass_instrument.rst.txt  |   16 +-
 .../optimize_operators/opt_conv_cuda.rst.txt       |    2 +-
 .../optimize_operators/opt_conv_tensorcore.rst.txt |    2 +-
 .../how_to/optimize_operators/opt_gemm.rst.txt     |   16 +-
 .../optimize_operators/sg_execution_times.rst.txt  |    8 +-
 .../sg_execution_times.rst.txt                     |   16 +-
 .../tune_conv2d_layer_cuda.rst.txt                 |    4 +-
 .../tune_network_cuda.rst.txt                      |    2 +-
 .../tune_network_x86.rst.txt                       |    4 +-
 .../tune_sparse_x86.rst.txt                        |    4 +-
 .../tune_with_autotvm/sg_execution_times.rst.txt   |   12 +-
 .../tune_with_autotvm/tune_conv2d_cuda.rst.txt     |   34 +-
 .../how_to/work_with_microtvm/index.rst.txt        |   20 +
 .../work_with_microtvm/micro_autotune.rst.txt      |   16 +-
 .../how_to/work_with_microtvm/micro_train.rst.txt  |  856 ++++++++++++++++
 .../work_with_microtvm/sg_execution_times.rst.txt  |   15 +-
 .../work_with_relay/sg_execution_times.rst.txt     |    8 +-
 .../work_with_schedules/sg_execution_times.rst.txt |   16 +-
 .../how_to/work_with_schedules/tensorize.rst.txt   |    2 +-
 .../tutorials/autotvm/sg_execution_times.rst.txt   |    6 +-
 .../frontend/deploy_classification.rst.txt         |    2 +-
 .../tutorials/frontend/deploy_detection.rst.txt    |    2 +-
 .../tutorials/frontend/sg_execution_times.rst.txt  |    6 +-
 .../tutorials/optimize/sg_execution_times.rst.txt  |    6 +-
 .../topic/vta/tutorials/sg_execution_times.rst.txt |    6 +-
 .../tutorial/auto_scheduler_matmul_x86.rst.txt     |    9 +-
 docs/_sources/tutorial/autotvm_relay_x86.rst.txt   |   54 +-
 .../tutorial/cross_compilation_and_rpc.rst.txt     |    2 +-
 docs/_sources/tutorial/intro_topi.rst.txt          |    2 +-
 docs/_sources/tutorial/sg_execution_times.rst.txt  |   26 +-
 .../tutorial/tensor_expr_get_started.rst.txt       |   49 +-
 docs/commit_hash                                   |    2 +-
 docs/how_to/compile_models/from_mxnet.html         |    2 +-
 docs/how_to/compile_models/from_oneflow.html       |   72 +-
 docs/how_to/compile_models/from_paddle.html        |    2 +-
 docs/how_to/compile_models/from_pytorch.html       |    9 +-
 docs/how_to/compile_models/from_tensorflow.html    |    1 +
 docs/how_to/compile_models/sg_execution_times.html |   22 +-
 .../deploy_models/deploy_model_on_android.html     |    2 +-
 .../deploy_object_detection_pytorch.html           |   21 +-
 docs/how_to/deploy_models/deploy_prequantized.html |    8 +-
 .../deploy_models/deploy_prequantized_tflite.html  |    4 +-
 docs/how_to/deploy_models/deploy_quantized.html    |    2 +-
 docs/how_to/deploy_models/deploy_ssd_gluoncv.html  |   35 +-
 docs/how_to/deploy_models/sg_execution_times.html  |   18 +-
 .../extend_tvm/bring_your_own_datatypes.html       |    2 +-
 docs/how_to/extend_tvm/sg_execution_times.html     |   10 +-
 docs/how_to/extend_tvm/use_pass_instrument.html    |   16 +-
 docs/how_to/optimize_operators/opt_conv_cuda.html  |    2 +-
 .../optimize_operators/opt_conv_tensorcore.html    |    2 +-
 docs/how_to/optimize_operators/opt_gemm.html       |   16 +-
 .../optimize_operators/sg_execution_times.html     |    8 +-
 .../sg_execution_times.html                        |   14 +-
 .../tune_conv2d_layer_cuda.html                    |    4 +-
 .../tune_with_autoscheduler/tune_network_cuda.html |    2 +-
 .../tune_with_autoscheduler/tune_network_x86.html  |    4 +-
 .../tune_with_autoscheduler/tune_sparse_x86.html   |    4 +-
 .../tune_with_autotvm/sg_execution_times.html      |   12 +-
 .../how_to/tune_with_autotvm/tune_conv2d_cuda.html |   34 +-
 docs/how_to/work_with_microtvm/index.html          |   11 +-
 docs/how_to/work_with_microtvm/micro_autotune.html |   17 +-
 docs/how_to/work_with_microtvm/micro_ethosu.html   |    1 +
 .../work_with_microtvm/micro_reference_vm.html     |    1 +
 docs/how_to/work_with_microtvm/micro_tflite.html   |    5 +-
 docs/how_to/work_with_microtvm/micro_train.html    | 1046 ++++++++++++++++++++
 docs/how_to/work_with_microtvm/micro_tvmc.html     |    5 +-
 .../work_with_microtvm/sg_execution_times.html     |   13 +-
 .../how_to/work_with_relay/sg_execution_times.html |    8 +-
 .../work_with_schedules/sg_execution_times.html    |   16 +-
 docs/how_to/work_with_schedules/tensorize.html     |    2 +-
 docs/objects.inv                                   |  Bin 22454 -> 22563 bytes
 docs/reference/api/python/auto_scheduler.html      |    4 +-
 .../api/typedoc/classes/bytestreamreader.html      |   12 +-
 .../api/typedoc/classes/cachedcallstack.html       |   34 +-
 docs/reference/api/typedoc/classes/dldatatype.html |   12 +-
 docs/reference/api/typedoc/classes/dldevice.html   |   10 +-
 .../reference/api/typedoc/classes/environment.html |   12 +-
 docs/reference/api/typedoc/classes/ffilibrary.html |   20 +-
 .../api/typedoc/classes/graphexecutor.html         |   16 +-
 docs/reference/api/typedoc/classes/instance.html   |   40 +-
 docs/reference/api/typedoc/classes/memory.html     |   34 +-
 docs/reference/api/typedoc/classes/module.html     |   10 +-
 docs/reference/api/typedoc/classes/ndarray.html    |   22 +-
 .../api/typedoc/classes/packedfunccell.html        |    6 +-
 docs/reference/api/typedoc/classes/rpcserver.html  |   14 +-
 docs/reference/api/typedoc/classes/scalar.html     |    6 +-
 .../api/typedoc/classes/webgpucontext.html         |   12 +-
 docs/reference/api/typedoc/enums/argtypecode.html  |   30 +-
 .../api/typedoc/enums/aynccallbackcode.html        |    4 +-
 .../api/typedoc/enums/dldatatypecode.html          |    8 +-
 .../api/typedoc/enums/rpcserverstate.html          |   12 +-
 docs/reference/api/typedoc/enums/sizeof.html       |   18 +-
 docs/reference/api/typedoc/index.html              |  112 +--
 .../api/typedoc/interfaces/disposable.html         |    2 +-
 .../api/typedoc/interfaces/functioninfo.html       |    6 +-
 .../api/typedoc/interfaces/libraryprovider.html    |    4 +-
 docs/searchindex.js                                |    2 +-
 .../vta/tutorials/autotvm/sg_execution_times.html  |    6 +-
 .../tutorials/frontend/deploy_classification.html  |    2 +-
 .../vta/tutorials/frontend/deploy_detection.html   |    2 +-
 .../vta/tutorials/frontend/sg_execution_times.html |    6 +-
 .../vta/tutorials/optimize/sg_execution_times.html |    6 +-
 docs/topic/vta/tutorials/sg_execution_times.html   |    6 +-
 docs/tutorial/auto_scheduler_matmul_x86.html       |    4 +-
 docs/tutorial/autotvm_relay_x86.html               |  258 ++---
 docs/tutorial/cross_compilation_and_rpc.html       |    2 +-
 docs/tutorial/intro_topi.html                      |    2 +-
 docs/tutorial/sg_execution_times.html              |   26 +-
 docs/tutorial/tensor_expr_get_started.html         |   45 +-
 128 files changed, 3724 insertions(+), 826 deletions(-)

diff --git a/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb b/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
new file mode 100644
index 000000000..d97c58b12
--- /dev/null
+++ b/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
@@ -0,0 +1,309 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "%matplotlib inline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n\nTraining Vision Models for microTVM on Arduino\n==============================================\n**Author**: `Gavin Uberti <https://github.com/guberti>`_\n\nThis tutorial shows how MobileNetV1 models can be trained\nto fit on embedded devices, and how those models can be\ndeployed to Arduino using TVM.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "<div class=\"alert alert-info\"><h4>Note</h4><p>This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally\n  using the link at the bottom of this page, or open it online for free using Google Colab.\n  Click the icon below to open in Google Colab.</p></div>\n\n![](https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png)\n\n     :align: center\n     :target: https://colab.research.google.com/gith [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import tensorflow as tf\n\nif not tf.test.gpu_device_name():\n    print(\"No GPU was detected!\")\n    print(\"Model training will take much longer (~30 minutes instead of ~5)\")\nelse:\n    print(\"GPU detected - you're good to go.\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Choosing Our Work Dir\n^^^^^^^^^^^^^^^^^^^^^\nWe need to pick a directory where our image datasets, trained model, and eventual Arduino sketch\nwill all live. If running on Google Colab, we'll save everything in ``/root`` (aka ``~``) but you'll\nprobably want to store it elsewhere if running locally. Note that this variable only affects Python\nscripts - you'll have to adjust the Bash commands too.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import os\n\nFOLDER = \"/root\"\n# sphinx_gallery_start_ignore\nimport tempfile\n\nFOLDER = tempfile.mkdtemp()\n# sphinx_gallery_end_ignore"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Downloading the Data\n--------------------\nConvolutional neural networks usually learn by looking at many images, along with labels telling\nthe network what those images are. To get these images, we'll need a publicly available dataset\nwith thousands of images of all sorts of objects and labels of what's in each image. We'll also\nneed a bunch of images that **aren't** of cars, as we're trying to distinguish these two classes.\n\nIn this tutorial, we'll create a model to dete [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import os\nimport shutil\nimport urllib.request\n\n# Download datasets\nos.makedirs(f\"{FOLDER}/images\")\nurllib.request.urlretrieve(\n    \"http://ai.stanford.edu/~jkrause/car196/cars_train.tgz\", f\"{FOLDER}/images/target.tgz\"\n)\nurllib.request.urlretrieve(\n    \"http://images.cocodataset.org/zips/val2017.zip\", f\"{FOLDER}/images/random.zip\"\n)\n\n# Extract them and rename their folders\nshutil.unpack_archive(f\"{FOLDER}/images/target.tgz\", f\"{FOLDER}/images\")\nshutil [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Loading the Data\n----------------\nCurrently, our data is stored on-disk as JPG files of various sizes. To train with it, we'll have\nto load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed\ndata. Keras's ``image_dataset_from_directory`` will take care of most of this, though it loads\nimages such that each pixel value is a float from 0 to 255.\n\nWe'll also need to load labels, though Keras will help with this. From our subdirectory struc [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "IMAGE_SIZE = (64, 64, 3)\nunscaled_dataset = tf.keras.utils.image_dataset_from_directory(\n    f\"{FOLDER}/images\",\n    batch_size=32,\n    shuffle=True,\n    label_mode=\"categorical\",\n    image_size=IMAGE_SIZE[0:2],\n)\nrescale = tf.keras.layers.Rescaling(scale=1.0 / 255)\nfull_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "What's Inside Our Dataset?\n^^^^^^^^^^^^^^^^^^^^^^^^^^\nBefore giving this data set to our neural network, we ought to give it a quick visual inspection.\nDoes the data look properly transformed? Do the labels seem appropriate? And what's our ratio of\nobjects to other stuff? We can display some examples from our datasets using ``matplotlib``:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import matplotlib.pyplot as plt\n\nnum_target_class = len(os.listdir(f\"{FOLDER}/images/target/\"))\nnum_random_class = len(os.listdir(f\"{FOLDER}/images/random/\"))\nprint(f\"{FOLDER}/images/target contains {num_target_class} images\")\nprint(f\"{FOLDER}/images/random contains {num_random_class} images\")\n\n# Show some samples and their labels\nSAMPLES_TO_SHOW = 10\nplt.figure(figsize=(20, 10))\nfor i, (image, label) in enumerate(unscaled_dataset.unbatch()):\n    if i >= SAMPL [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Validating our Accuracy\n^^^^^^^^^^^^^^^^^^^^^^^\nWhile developing our model, we'll often want to check how accurate it is (e.g. to see if it\nimproves during training). How do we do this? We could just train it on *all* of the data, and\nthen ask it to classify that same data. However, our model could cheat by just memorizing all of\nthe samples, which would make it *appear* to have very high accuracy, but perform very badly in\nreality. In practice, this \"memorizing\" is call [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "num_batches = len(full_dataset)\ntrain_dataset = full_dataset.take(int(num_batches * 0.8))\nvalidation_dataset = full_dataset.skip(len(train_dataset))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Loading the Data\n----------------\nIn the past decade, `convolutional neural networks <https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ have been widely\nadopted for image classification tasks. State-of-the-art models like `EfficientNet V2 <https://arxiv.org/abs/2104.00298>`_ are able\nto perform image classification better than even humans! Unfortunately, these models have tens of\nmillions of parameters, and thus won't fit on cheap security camera computers.\n\nO [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "os.makedirs(f\"{FOLDER}/models\")\nWEIGHTS_PATH = f\"{FOLDER}/models/mobilenet_2_5_128_tf.h5\"\nurllib.request.urlretrieve(\n    \"https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5\",\n    WEIGHTS_PATH,\n)\n\npretrained = tf.keras.applications.MobileNet(\n    input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25\n)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Modifying Our Network\n^^^^^^^^^^^^^^^^^^^^^\nAs mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories,\nbut we want to convert it to classify cars. Since only the bottom few layers are task-specific,\nwe'll **cut off the last five layers** of our original model. In their place we'll build our own\n\"tail\" to the model by performing respape, dropout, flatten, and softmax operations.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "model = tf.keras.models.Sequential()\n\nmodel.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE))\nmodel.add(tf.keras.Model(inputs=pretrained.inputs, outputs=pretrained.layers[-5].output))\n\nmodel.add(tf.keras.layers.Reshape((-1,)))\nmodel.add(tf.keras.layers.Dropout(0.1))\nmodel.add(tf.keras.layers.Flatten())\nmodel.add(tf.keras.layers.Dense(2, activation=\"softmax\"))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Fine Tuning Our Network\n^^^^^^^^^^^^^^^^^^^^^^^\nWhen training neural networks, we must set a parameter called the **learning rate** that controls\nhow fast our network learns. It must be set carefully - too slow, and our network will take\nforever to train; too fast, and our network won't be able to learn some fine details. Generally\nfor Adam (the optimizer we're using), ``0.001`` is a pretty good learning rate (and is what's\nrecommended in the `original paper <https://arxiv [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "model.compile(\n    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),\n    loss=\"categorical_crossentropy\",\n    metrics=[\"accuracy\"],\n)\nmodel.fit(train_dataset, validation_data=validation_dataset, epochs=3, verbose=2)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Quantization\n------------\nWe've done a decent job of reducing our model's size so far - changing the input dimension,\nalong with removing the bottom layers reduced the model to just 219k parameters. However, each of\nthese parameters is a ``float32`` that takes four bytes, so our model will take up almost one MB!\n\nAdditionally, it might be the case that our hardware doesn't have built-in support for floating\npoint numbers. While most high-memory Arduinos (like the Nano 33  [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "def representative_dataset():\n    for image_batch, label_batch in full_dataset.take(10):\n        yield [image_batch]\n\n\nconverter = tf.lite.TFLiteConverter.from_keras_model(model)\nconverter.optimizations = [tf.lite.Optimize.DEFAULT]\nconverter.representative_dataset = representative_dataset\nconverter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\nconverter.inference_input_type = tf.uint8\nconverter.inference_output_type = tf.uint8\n\nquantized_model = c [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Download the Model if Desired\n^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nWe've now got a finished model that you can use locally or in other tutorials (try autotuning\nthis model or viewing it on `https://netron.app/ <https://netron.app/>`_). But before we do\nthose things, we'll have to write it to a file (``quantized.tflite``). If you're running this\ntutorial on Google Colab, you'll have to uncomment the last two lines to download the file\nafter writing it.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "QUANTIZED_MODEL_PATH = f\"{FOLDER}/models/quantized.tflite\"\nwith open(QUANTIZED_MODEL_PATH, \"wb\") as f:\n    f.write(quantized_model)\n# from google.colab import files\n# files.download(QUANTIZED_MODEL_PATH)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Compiling With TVM For Arduino\n------------------------------\nTensorFlow has a built-in framework for deploying to microcontrollers - `TFLite Micro <https://www.tensorflow.org/lite/microcontrollers>`_. However,\nit's poorly supported by development boards and does not support autotuning. We will use Apache\nTVM instead.\n\nTVM can be used either with its command line interface (``tvmc``) or with its Python interface. The\nPython interface is fully-featured and more stable, so  [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import shutil\nimport tflite\nimport tvm\n\n# Method to load model is different in TFLite 1 vs 2\ntry:  # TFLite 2.1 and above\n    tflite_model = tflite.Model.GetRootAsModel(quantized_model, 0)\nexcept AttributeError:  # Fall back to TFLite 1.14 method\n    tflite_model = tflite.Model.Model.GetRootAsModel(quantized_model, 0)\n\n# Convert to the Relay intermediate representation\nmod, params = tvm.relay.frontend.from_tflite(tflite_model)\n\n# Set configuration flags to improve p [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Testing our Arduino Project\n---------------------------\nConsider the following two 224x224 images from the author's camera roll - one of a car, one not.\nWe will test our Arduino project by loading both of these images and executing the compiled model\non them.\n\n![](https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png)\n\n     :align: center\n     :height: 200px\n     :width: 600px\n\nCurrently, t [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Writing our Arduino Script\n--------------------------\nWe now need a little bit of Arduino code to read the two binary arrays we just generated, run the\nmodel on them, and log the output to the serial monitor. This file will replace ``arduino_sketch.ino``\nas the main file of our sketch. You'll have to copy this code in manually..\n\n    .. code-block:: c\n\n        %%writefile /root/models/project.ino\n        #include \"src/model.h\"\n        #include \"car.c\"\n        #inc [...]
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "shutil.rmtree(f\"{FOLDER}/models/project/build\", ignore_errors=True)\n# sphinx_gallery_start_ignore\nfrom unittest.mock import MagicMock\n\narduino_project = MagicMock()\n# sphinx_gallery_end_ignore\narduino_project.build()\nprint(\"Compilation succeeded!\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Uploading to Our Device\n-----------------------\nThe very last step is uploading our sketch to an Arduino to make sure our code works properly.\nUnfortunately, we can't do that from Google Colab, so we'll have to download our sketch. This is\nsimple enough to do - we'll just turn our project into a `.zip` archive, and call `files.download`.\nIf you're running on Google Colab, you'll have to uncomment the last two lines to download the file\nafter writing it.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "ZIP_FOLDER = f\"{FOLDER}/models/project\"\nshutil.make_archive(ZIP_FOLDER, \"zip\", ZIP_FOLDER)\n# from google.colab import files\n# files.download(f\"{FOLDER}/models/project.zip\")\n# sphinx_gallery_start_ignore\n# Run a few unit tests to make sure the Python code worked\n\n# Ensure transfer learn model was correctly assembled\nassert len(model.layers) == 5\nassert model.count_params() == 219058  # Only 219,058 of these are trainable\n\nassert len(quantized_model) >= 250000  #  [...]
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as\nthe SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may\nhave to change settings to control how much memory you want the board to use.\n\nExpected Results\n^^^^^^^^^^^^^^^^\nIf all works as expected, you should see the following output on a Serial monitor:\n\n    .. code-block::\n\n      Car results:\n      255, 0\n      Other object results:\n  [...]
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.7.5"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py b/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py
new file mode 100644
index 000000000..378fe56d9
--- /dev/null
+++ b/docs/_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py
@@ -0,0 +1,649 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+"""
+
+######################################################################
+# .. note::
+#
+#   This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally
+#   using the link at the bottom of this page, or open it online for free using Google Colab.
+#   Click the icon below to open in Google Colab.
+#
+# .. image:: https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+#      :align: center
+#      :target: https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+#      :width: 300px
+#
+# Motivation
+# ----------
+# When building IOT devices, we often want them to **see and understand** the world around them.
+# This can take many forms, but often times a device will want to know if a certain **kind of
+# object** is in its field of vision.
+#
+# For example, a security camera might look for **people**, so it can decide whether to save a video
+# to memory. A traffic light might look for **cars**, so it can judge which lights should change
+# first. Or a forest camera might look for a **kind of animal**, so they can estimate how large
+# the animal population is.
+#
+# To make these devices affordable, we would like them to need only a low-cost processor like the
+# `nRF52840 <https://www.nordicsemi.com/Products/nRF52840>`_ (costing five dollars each on Mouser) or the `RP2040 <https://www.raspberrypi.com/products/rp2040/>`_ (just $1.45 each!).
+#
+# These devices have very little memory (~250 KB RAM), meaning that no conventional edge AI
+# vision model (like MobileNet or EfficientNet) will be able to run. In this tutorial, we will
+# show how these models can be modified to work around this requirement. Then, we will use TVM
+# to compile and deploy it for an Arduino that uses one of these processors.
+#
+# Installing the Prerequisites
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+#
+# This tutorial will use TensorFlow to train the model - a widely used machine learning library
+# created by Google. TensorFlow is a very low-level library, however, so we will the Keras
+# interface to talk to TensorFlow. We will also use TensorFlow Lite to perform quantization on
+# our model, as TensorFlow by itself does not support this.
+#
+# Once we have our generated model, we will use TVM to compile and test it. To avoid having to
+# build from source, we'll install ``tlcpack`` - a community build of TVM. Lastly, we'll also
+# install ``imagemagick`` and ``curl`` to preprocess data:
+#
+#     .. code-block:: bash
+#
+#       %%bash
+#       pip install -q tensorflow tflite
+#       pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels
+#       apt-get -qq install imagemagick curl
+#
+#       # Install Arduino CLI and library for Nano 33 BLE
+#       curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh
+#       /content/bin/arduino-cli core update-index
+#       /content/bin/arduino-cli core install arduino:mbed_nano
+#
+# Using the GPU
+# ^^^^^^^^^^^^^
+#
+# This tutorial demonstrates training a neural network, which is requires a lot of computing power
+# and will go much faster if you have a GPU. If you are viewing this tutorial on Google Colab, you
+# can enable a GPU by going to **Runtime->Change runtime type** and selecting "GPU" as our hardware
+# accelerator. If you are running locally, you can `follow TensorFlow's guide <https://www.tensorflow.org/guide/gpu>`_ instead.
+#
+# We can test our GPU installation with the following code:
+
+import tensorflow as tf
+
+if not tf.test.gpu_device_name():
+    print("No GPU was detected!")
+    print("Model training will take much longer (~30 minutes instead of ~5)")
+else:
+    print("GPU detected - you're good to go.")
+
+######################################################################
+# Choosing Our Work Dir
+# ^^^^^^^^^^^^^^^^^^^^^
+# We need to pick a directory where our image datasets, trained model, and eventual Arduino sketch
+# will all live. If running on Google Colab, we'll save everything in ``/root`` (aka ``~``) but you'll
+# probably want to store it elsewhere if running locally. Note that this variable only affects Python
+# scripts - you'll have to adjust the Bash commands too.
+
+import os
+
+FOLDER = "/root"
+# sphinx_gallery_start_ignore
+import tempfile
+
+FOLDER = tempfile.mkdtemp()
+# sphinx_gallery_end_ignore
+
+######################################################################
+# Downloading the Data
+# --------------------
+# Convolutional neural networks usually learn by looking at many images, along with labels telling
+# the network what those images are. To get these images, we'll need a publicly available dataset
+# with thousands of images of all sorts of objects and labels of what's in each image. We'll also
+# need a bunch of images that **aren't** of cars, as we're trying to distinguish these two classes.
+#
+# In this tutorial, we'll create a model to detect if an image contains a **car**, but you can use
+# whatever category you like! Just change the source URL below to one containing images of another
+# type of object.
+#
+# To get our car images, we'll be downloading the `Stanford Cars dataset <http://ai.stanford.edu/~jkrause/cars/car_dataset.html>`_,
+# which contains 16,185 full color images of cars. We'll also need images of random things that
+# aren't cars, so we'll use the `COCO 2017 <https://cocodataset.org/#home>`_ validation set (it's
+# smaller, and thus faster to download than the full training set. Training on the full data set
+# would yield better results). Note that there are some cars in the COCO 2017 data set, but it's
+# a small enough fraction not to matter - just keep in mind that this will drive down our percieved
+# accuracy slightly.
+#
+# We could use the TensorFlow dataloader utilities, but we'll instead do it manually to make sure
+# it's easy to change the datasets being used. We'll end up with the following file hierarchy:
+#
+#     .. code-block::
+#
+#         /root
+#         ├── images
+#         │   ├── object
+#         │   │   ├── 000001.jpg
+#         │   │   │ ...
+#         │   │   └── 016185.jpg
+#         │   ├── object.tgz
+#         │   ├── random
+#         │   │   ├── 000000000139.jpg
+#         │   │   │ ...
+#         │   │   └── 000000581781.jpg
+#         │   └── random.zip
+#
+# We should also note that Stanford cars has 8k images, while the COCO 2017 validation set is 5k
+# images - it is not a 50/50 split! If we wanted to, we could weight these classes differently
+# during training to correct for this, but training will still work if we ignore it. It should
+# take about **2 minutes** to download the Stanford Cars, while COCO 2017 validation will take
+# **1 minute**.
+
+import os
+import shutil
+import urllib.request
+
+# Download datasets
+os.makedirs(f"{FOLDER}/images")
+urllib.request.urlretrieve(
+    "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz", f"{FOLDER}/images/target.tgz"
+)
+urllib.request.urlretrieve(
+    "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/images/random.zip"
+)
+
+# Extract them and rename their folders
+shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
+shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
+shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
+shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+
+######################################################################
+# Loading the Data
+# ----------------
+# Currently, our data is stored on-disk as JPG files of various sizes. To train with it, we'll have
+# to load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed
+# data. Keras's ``image_dataset_from_directory`` will take care of most of this, though it loads
+# images such that each pixel value is a float from 0 to 255.
+#
+# We'll also need to load labels, though Keras will help with this. From our subdirectory structure,
+# it knows the images in ``/objects`` are one class, and those in ``/random`` another. Setting
+# ``label_mode='categorical'`` tells Keras to convert these into **categorical labels** - a 2x1 vector
+# that's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` vector for anything else.
+# We'll also set ``shuffle=True`` to randomize the order of our examples.
+#
+# We will also **batch** the data - grouping samples into clumps to make our training go faster.
+# Setting ``batch_size = 32`` is a decent number.
+#
+# Lastly, in machine learning we generally want our inputs to be small numbers. We'll thus use a
+# ``Rescaling`` layer to change our images such that each pixel is a float between ``0.0`` and ``1.0``,
+# instead of ``0`` to ``255``. We need to be careful not to rescale our categorical labels though, so
+# we'll use a ``lambda`` function.
+
+IMAGE_SIZE = (64, 64, 3)
+unscaled_dataset = tf.keras.utils.image_dataset_from_directory(
+    f"{FOLDER}/images",
+    batch_size=32,
+    shuffle=True,
+    label_mode="categorical",
+    image_size=IMAGE_SIZE[0:2],
+)
+rescale = tf.keras.layers.Rescaling(scale=1.0 / 255)
+full_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))
+
+######################################################################
+# What's Inside Our Dataset?
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Before giving this data set to our neural network, we ought to give it a quick visual inspection.
+# Does the data look properly transformed? Do the labels seem appropriate? And what's our ratio of
+# objects to other stuff? We can display some examples from our datasets using ``matplotlib``:
+
+import matplotlib.pyplot as plt
+
+num_target_class = len(os.listdir(f"{FOLDER}/images/target/"))
+num_random_class = len(os.listdir(f"{FOLDER}/images/random/"))
+print(f"{FOLDER}/images/target contains {num_target_class} images")
+print(f"{FOLDER}/images/random contains {num_random_class} images")
+
+# Show some samples and their labels
+SAMPLES_TO_SHOW = 10
+plt.figure(figsize=(20, 10))
+for i, (image, label) in enumerate(unscaled_dataset.unbatch()):
+    if i >= SAMPLES_TO_SHOW:
+        break
+    ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1)
+    plt.imshow(image.numpy().astype("uint8"))
+    plt.title(list(label.numpy()))
+    plt.axis("off")
+
+######################################################################
+# Validating our Accuracy
+# ^^^^^^^^^^^^^^^^^^^^^^^
+# While developing our model, we'll often want to check how accurate it is (e.g. to see if it
+# improves during training). How do we do this? We could just train it on *all* of the data, and
+# then ask it to classify that same data. However, our model could cheat by just memorizing all of
+# the samples, which would make it *appear* to have very high accuracy, but perform very badly in
+# reality. In practice, this "memorizing" is called **overfitting**.
+#
+# To prevent this, we will set aside some of the data (we'll use 20%) as a **validation set**. Our
+# model will never be trained on validation data - we'll only use it to check our model's accuracy.
+
+num_batches = len(full_dataset)
+train_dataset = full_dataset.take(int(num_batches * 0.8))
+validation_dataset = full_dataset.skip(len(train_dataset))
+
+######################################################################
+# Loading the Data
+# ----------------
+# In the past decade, `convolutional neural networks <https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ have been widely
+# adopted for image classification tasks. State-of-the-art models like `EfficientNet V2 <https://arxiv.org/abs/2104.00298>`_ are able
+# to perform image classification better than even humans! Unfortunately, these models have tens of
+# millions of parameters, and thus won't fit on cheap security camera computers.
+#
+# Our applications generally don't need perfect accuracy - 90% is good enough. We can thus use the
+# older and smaller MobileNet V1 architecture. But this *still* won't be small enough - by default,
+# MobileNet V1 with 224x224 inputs and alpha 1.0 takes ~50 MB to just **store**. To reduce the size
+# of the model, there are three knobs we can turn. First, we can reduce the size of the input images
+# from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can also reduce the **alpha**
+# of the model, from 1.0 to 0.25, which downscales the width of the network (and the number of
+# filters) by a factor of four. And if we were really strapped for space, we could reduce the
+# number of **channels** by making our model take grayscale images instead of RGB ones.
+#
+# In this tutorial, we will use an RGB 64x64 input image and alpha 0.25. This is not quite
+# ideal, but it allows the finished model to fit in 192 KB of RAM, while still letting us perform
+# transfer learning using the official TensorFlow source models (if we used alpha <0.25 or a
+# grayscale input, we wouldn't be able to do this).
+#
+# What is Transfer Learning?
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^
+# Deep learning has `dominated image classification <https://paperswithcode.com/sota/image-classification-on-imagenet>`_ for a long time,
+# but training neural networks takes a lot of time. When a neural network is trained "from scratch",
+# its parameters start out randomly initialized, forcing it to learn very slowly how to tell images
+# apart.
+#
+# With transfer learning, we instead start with a neural network that's **already** good at a
+# specific task. In this example, that task is classifying images from `the ImageNet database <https://www.image-net.org/>`_. This
+# means the network already has some object detection capabilities, and is likely closer to what you
+# want then a random model would be.
+#
+# This works especially well with image processing neural networks like MobileNet. In practice, it
+# turns out the convolutional layers of the model (i.e. the first 90% of the layers) are used for
+# identifying low-level features like lines and shapes - only the last few fully connected layers
+# are used to determine how those shapes make up the objects the network is trying to detect.
+#
+# We can take advantage of this by starting training with a MobileNet model that was trained on
+# ImageNet, and already knows how to identify those lines and shapes. We can then just remove the
+# last few layers from this pretrained model, and add our own final layers. We'll then train this
+# conglomerate model for a few epochs on our cars vs non-cars dataset, to adjust the first layers
+# and train from scratch the last layers. This process of training an already-partially-trained
+# model is called *fine-tuning*.
+#
+# Source MobileNets for transfer learning have been `pretrained by the TensorFlow folks <https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md>`_, so we
+# can just download the one closest to what we want (the 128x128 input model with 0.25 depth scale).
+
+os.makedirs(f"{FOLDER}/models")
+WEIGHTS_PATH = f"{FOLDER}/models/mobilenet_2_5_128_tf.h5"
+urllib.request.urlretrieve(
+    "https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5",
+    WEIGHTS_PATH,
+)
+
+pretrained = tf.keras.applications.MobileNet(
+    input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25
+)
+
+######################################################################
+# Modifying Our Network
+# ^^^^^^^^^^^^^^^^^^^^^
+# As mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories,
+# but we want to convert it to classify cars. Since only the bottom few layers are task-specific,
+# we'll **cut off the last five layers** of our original model. In their place we'll build our own
+# "tail" to the model by performing respape, dropout, flatten, and softmax operations.
+
+model = tf.keras.models.Sequential()
+
+model.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE))
+model.add(tf.keras.Model(inputs=pretrained.inputs, outputs=pretrained.layers[-5].output))
+
+model.add(tf.keras.layers.Reshape((-1,)))
+model.add(tf.keras.layers.Dropout(0.1))
+model.add(tf.keras.layers.Flatten())
+model.add(tf.keras.layers.Dense(2, activation="softmax"))
+
+######################################################################
+# Fine Tuning Our Network
+# ^^^^^^^^^^^^^^^^^^^^^^^
+# When training neural networks, we must set a parameter called the **learning rate** that controls
+# how fast our network learns. It must be set carefully - too slow, and our network will take
+# forever to train; too fast, and our network won't be able to learn some fine details. Generally
+# for Adam (the optimizer we're using), ``0.001`` is a pretty good learning rate (and is what's
+# recommended in the `original paper <https://arxiv.org/abs/1412.6980>`_). However, in this case
+# ``0.0005`` seems to work a little better.
+#
+# We'll also pass the validation set from earlier to ``model.fit``. This will evaluate how good our
+# model is each time we train it, and let us track how our model is improving. Once training is
+# finished, the model should have a validation accuracy around ``0.98`` (meaning it was right 98% of
+# the time on our validation set).
+
+model.compile(
+    optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
+    loss="categorical_crossentropy",
+    metrics=["accuracy"],
+)
+model.fit(train_dataset, validation_data=validation_dataset, epochs=3, verbose=2)
+
+######################################################################
+# Quantization
+# ------------
+# We've done a decent job of reducing our model's size so far - changing the input dimension,
+# along with removing the bottom layers reduced the model to just 219k parameters. However, each of
+# these parameters is a ``float32`` that takes four bytes, so our model will take up almost one MB!
+#
+# Additionally, it might be the case that our hardware doesn't have built-in support for floating
+# point numbers. While most high-memory Arduinos (like the Nano 33 BLE) do have hardware support,
+# some others (like the Arduino Due) do not. On any boards *without* dedicated hardware support,
+# floating point multiplication will be extremely slow.
+#
+# To address both issues we will **quantize** the model - representing the weights as eight bit
+# integers. It's more complex than just rounding, though - to get the best performance, TensorFlow
+# tracks how each neuron in our model activates, so we can figure out how most accurately simulate
+# the neuron's original activations with integer operations.
+#
+# We will help TensorFlow do this by creating a representative dataset - a subset of the original
+# that is used for tracking how those neurons activate. We'll then pass this into a ``TFLiteConverter``
+# (Keras itself does not have quantization support) with an ``Optimize`` flag to tell TFLite to perform
+# the conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must
+# explicitly tell it to avoid this behavior.
+
+
+def representative_dataset():
+    for image_batch, label_batch in full_dataset.take(10):
+        yield [image_batch]
+
+
+converter = tf.lite.TFLiteConverter.from_keras_model(model)
+converter.optimizations = [tf.lite.Optimize.DEFAULT]
+converter.representative_dataset = representative_dataset
+converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
+converter.inference_input_type = tf.uint8
+converter.inference_output_type = tf.uint8
+
+quantized_model = converter.convert()
+
+######################################################################
+# Download the Model if Desired
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# We've now got a finished model that you can use locally or in other tutorials (try autotuning
+# this model or viewing it on `https://netron.app/ <https://netron.app/>`_). But before we do
+# those things, we'll have to write it to a file (``quantized.tflite``). If you're running this
+# tutorial on Google Colab, you'll have to uncomment the last two lines to download the file
+# after writing it.
+
+QUANTIZED_MODEL_PATH = f"{FOLDER}/models/quantized.tflite"
+with open(QUANTIZED_MODEL_PATH, "wb") as f:
+    f.write(quantized_model)
+# from google.colab import files
+# files.download(QUANTIZED_MODEL_PATH)
+
+######################################################################
+# Compiling With TVM For Arduino
+# ------------------------------
+# TensorFlow has a built-in framework for deploying to microcontrollers - `TFLite Micro <https://www.tensorflow.org/lite/microcontrollers>`_. However,
+# it's poorly supported by development boards and does not support autotuning. We will use Apache
+# TVM instead.
+#
+# TVM can be used either with its command line interface (``tvmc``) or with its Python interface. The
+# Python interface is fully-featured and more stable, so we'll use it here.
+#
+# TVM is an optimizing compiler, and optimizations to our model are performed in stages via
+# **intermediate representations**. The first of these is `Relay <https://arxiv.org/abs/1810.00952>`_ a high-level intermediate
+# representation emphasizing portability. The conversion from ``.tflite`` to Relay is done without any
+# knowledge of our "end goal" - the fact we intend to run this model on an Arduino.
+#
+# Choosing an Arduino Board
+# ^^^^^^^^^^^^^^^^^^^^^^^^^
+# Next, we'll have to decide exactly which Arduino board to use. The Arduino sketch that we
+# ultimately generate should be compatible with any board, but knowing which board we are using in
+# advance allows TVM to adjust its compilation strategy to get better performance.
+#
+# There is one catch - we need enough **memory** (flash and RAM) to be able to run our model. We
+# won't ever be able to run a complex vision model like a MobileNet on an Arduino Uno - that board
+# only has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, so there is just no
+# way it could fit.
+#
+# For this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash memory and 256 KB of RAM.
+# However, any other Arduino with those specs or better should also work.
+#
+# Generating our project
+# ^^^^^^^^^^^^^^^^^^^^^^
+# Next, we'll compile the model to TVM's MLF (model library format) intermediate representation,
+# which consists of C/C++ code and is designed for autotuning. To improve performance, we'll tell
+# TVM that we're compiling for the ``nrf52840`` microprocessor (the one the Nano 33 BLE uses). We'll
+# also tell it to use the C runtime (abbreviated ``crt``) and to use ahead-of-time memory allocation
+# (abbreviated ``aot``, which helps reduce the model's memory footprint). Lastly, we will disable
+# vectorization with ``"tir.disable_vectorize": True``, as C has no native vectorized types.
+#
+# Once we have set these configuration parameters, we will call ``tvm.relay.build`` to compile our
+# Relay model into the MLF intermediate representation. From here, we just need to call
+# ``tvm.micro.generate_project`` and pass in the Arduino template project to finish compilation.
+
+import shutil
+import tflite
+import tvm
+
+# Method to load model is different in TFLite 1 vs 2
+try:  # TFLite 2.1 and above
+    tflite_model = tflite.Model.GetRootAsModel(quantized_model, 0)
+except AttributeError:  # Fall back to TFLite 1.14 method
+    tflite_model = tflite.Model.Model.GetRootAsModel(quantized_model, 0)
+
+# Convert to the Relay intermediate representation
+mod, params = tvm.relay.frontend.from_tflite(tflite_model)
+
+# Set configuration flags to improve performance
+target = tvm.target.target.micro("nrf52840")
+runtime = tvm.relay.backend.Runtime("crt")
+executor = tvm.relay.backend.Executor("aot", {"unpacked-api": True})
+
+# Convert to the MLF intermediate representation
+with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
+    mod = tvm.relay.build(mod, target, runtime=runtime, executor=executor, params=params)
+
+# Generate an Arduino project from the MLF intermediate representation
+shutil.rmtree(f"{FOLDER}/models/project", ignore_errors=True)
+arduino_project = tvm.micro.generate_project(
+    tvm.micro.get_microtvm_template_projects("arduino"),
+    mod,
+    f"{FOLDER}/models/project",
+    {
+        "arduino_board": "nano33ble",
+        "arduino_cli_cmd": "/content/bin/arduino-cli",
+        "project_type": "example_project",
+    },
+)
+
+######################################################################
+# Testing our Arduino Project
+# ---------------------------
+# Consider the following two 224x224 images from the author's camera roll - one of a car, one not.
+# We will test our Arduino project by loading both of these images and executing the compiled model
+# on them.
+#
+# .. image:: https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png
+#      :align: center
+#      :height: 200px
+#      :width: 600px
+#
+# Currently, these are 224x224 PNG images we can download from Imgur. Before we can feed in these
+# images, we'll need to resize and convert them to raw data, which can be done with ``imagemagick``.
+#
+# It's also challenging to load raw data onto an Arduino, as only C/CPP files (and similar) are
+# compiled. We can work around this by embedding our raw data in a hard-coded C array with the
+# built-in utility ``bin2c`` that will output a file like below:
+#
+#     .. code-block:: c
+#
+#       static const unsigned char CAR_IMAGE[] = {
+#         0x22,0x23,0x14,0x22,
+#         ...
+#         0x07,0x0e,0x08,0x08
+#       };
+#
+# We can do both of these things with a few lines of Bash code:
+#
+#     .. code-block:: bash
+#
+#       %%bash
+#       mkdir -p ~/tests
+#       curl "https://i.imgur.com/JBbEhxN.png" -o ~/tests/car_224.png
+#       convert ~/tests/car_224.png -resize 64 ~/tests/car_64.png
+#       stream ~/tests/car_64.png ~/tests/car.raw
+#       bin2c -c -st ~/tests/car.raw --name CAR_IMAGE > ~/models/project/car.c
+#
+#       curl "https://i.imgur.com/wkh7Dx2.png" -o ~/tests/catan_224.png
+#       convert ~/tests/catan_224.png -resize 64 ~/tests/catan_64.png
+#       stream ~/tests/catan_64.png ~/tests/catan.raw
+#       bin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE > ~/models/project/catan.c
+
+######################################################################
+# Writing our Arduino Script
+# --------------------------
+# We now need a little bit of Arduino code to read the two binary arrays we just generated, run the
+# model on them, and log the output to the serial monitor. This file will replace ``arduino_sketch.ino``
+# as the main file of our sketch. You'll have to copy this code in manually..
+#
+#     .. code-block:: c
+#
+#         %%writefile /root/models/project.ino
+#         #include "src/model.h"
+#         #include "car.c"
+#         #include "catan.c"
+#
+#         void setup() {
+#           Serial.begin(9600);
+#           TVMInitialize();
+#         }
+#
+#         void loop() {
+#           uint8_t result_data[2];
+#           Serial.println("Car results:");
+#           TVMExecute(const_cast<uint8_t*>(CAR_IMAGE), result_data);
+#           Serial.print(result_data[0]); Serial.print(", ");
+#           Serial.print(result_data[1]); Serial.println();
+#
+#           Serial.println("Other object results:");
+#           TVMExecute(const_cast<uint8_t*>(CATAN_IMAGE), result_data);
+#           Serial.print(result_data[0]); Serial.print(", ");
+#           Serial.print(result_data[1]); Serial.println();
+#
+#           delay(1000);
+#         }
+#
+# Compiling Our Code
+# ^^^^^^^^^^^^^^^^^^
+# Now that our project has been generated, TVM's job is mostly done! We can still call
+# ``arduino_project.build()`` and ``arduino_project.upload()``, but these just use ``arduino-cli``'s
+# compile and flash commands underneath. We could also begin autotuning our model, but that's a
+# subject for a different tutorial. To finish up, we'll verify no compiler errors are thrown
+# by our project:
+
+shutil.rmtree(f"{FOLDER}/models/project/build", ignore_errors=True)
+# sphinx_gallery_start_ignore
+from unittest.mock import MagicMock
+
+arduino_project = MagicMock()
+# sphinx_gallery_end_ignore
+arduino_project.build()
+print("Compilation succeeded!")
+
+######################################################################
+# Uploading to Our Device
+# -----------------------
+# The very last step is uploading our sketch to an Arduino to make sure our code works properly.
+# Unfortunately, we can't do that from Google Colab, so we'll have to download our sketch. This is
+# simple enough to do - we'll just turn our project into a `.zip` archive, and call `files.download`.
+# If you're running on Google Colab, you'll have to uncomment the last two lines to download the file
+# after writing it.
+
+ZIP_FOLDER = f"{FOLDER}/models/project"
+shutil.make_archive(ZIP_FOLDER, "zip", ZIP_FOLDER)
+# from google.colab import files
+# files.download(f"{FOLDER}/models/project.zip")
+# sphinx_gallery_start_ignore
+# Run a few unit tests to make sure the Python code worked
+
+# Ensure transfer learn model was correctly assembled
+assert len(model.layers) == 5
+assert model.count_params() == 219058  # Only 219,058 of these are trainable
+
+assert len(quantized_model) >= 250000  # Quantized model will be 250 KB - 350 KB
+assert len(quantized_model) <= 350000  # Exact value depends on quantization
+
+# Assert .tflite and .zip files were written to disk
+assert os.path.isfile(f"{FOLDER}/models/quantized.tflite")
+assert os.path.isfile(f"{FOLDER}/models/project.zip")
+
+# Assert MLF file was correctly generated
+assert str(mod.executor) == "aot"
+
+# Remove the temporary folder we generated at the beginning
+shutil.rmtree(FOLDER)
+# sphinx_gallery_end_ignore
+
+
+######################################################################
+# From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as
+# the SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may
+# have to change settings to control how much memory you want the board to use.
+#
+# Expected Results
+# ^^^^^^^^^^^^^^^^
+# If all works as expected, you should see the following output on a Serial monitor:
+#
+#     .. code-block::
+#
+#       Car results:
+#       255, 0
+#       Other object results:
+#       0, 255
+#
+# The first number represents the model's confidence that the object **is** a car and ranges from
+# 0-255. The second number represents the model's confidence that the object **is not** a car and
+# is also 0-255. These results mean the model is very sure that the first image is a car, and the
+# second image is not (which is correct). Hence, our model is working!
+#
+# Summary
+# -------
+# In this tutorial, we used transfer learning to quickly train an image recognition model to
+# identify cars. We modified its input dimensions and last few layers to make it better at this,
+# and to make it faster and smaller. We then quantified the model and compiled it using TVM to
+# create an Arduino sketch. Lastly, we tested the model using two static images to prove it works
+# as intended.
+#
+# Next Steps
+# ^^^^^^^^^^
+# From here, we could modify the model to read live images from the camera - we have another
+# Arduino tutorial for how to do that `on GitHub <https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection>`_. Alternatively, we could also
+# `use TVM's autotuning capabilities <https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html>`_ to dramatically improve the model's performance.
+#
diff --git a/docs/_images/sphx_glr_micro_train_001.png b/docs/_images/sphx_glr_micro_train_001.png
new file mode 100644
index 000000000..ffe79049b
Binary files /dev/null and b/docs/_images/sphx_glr_micro_train_001.png differ
diff --git a/docs/_images/sphx_glr_micro_train_thumb.png b/docs/_images/sphx_glr_micro_train_thumb.png
new file mode 100644
index 000000000..cf2ccbf9f
Binary files /dev/null and b/docs/_images/sphx_glr_micro_train_thumb.png differ
diff --git a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
index 2c7215729..e32cc7445 100644
--- a/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_mxnet.rst.txt
@@ -98,7 +98,7 @@ In this section, we download a pretrained imagenet model and classify an image.
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip4b03aae9-9176-4f2b-b0fa-6faad5980f7a from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+    Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zipc9b717e6-28d6-45f6-8568-3018b30c9f29 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
     x (1, 3, 224, 224)
 
 
diff --git a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
index 77e094154..411ff9121 100644
--- a/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_oneflow.rst.txt
@@ -100,7 +100,7 @@ Load a pretrained OneFlow model and save model
  .. code-block:: none
 
     Downloading: "https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip" to /workspace/.oneflow/flowvision_cache/resnet18.zip
-
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
      0%|          | 16.0k/41.5M [00:00<08:15, 87.7kB/s]
      0%|          | 48.0k/41.5M [00:00<05:12, 139kB/s] 
      0%|          | 96.0k/41.5M [00:00<03:42, 195kB/s]
      0%|          | 160k/41.5M [00:00<02:49, 256kB/s] 
      1%|          | 272k/41.5M [00:00<01:52, 384kB/s]
      1%|1         | 512k/41.5M [00:01<01:01, 700kB/s]
      2%|2         | 984k/41.5M [00:01<00:32, 1.31MB/s]
      5%|4         | 1.89M/41.5M [00:01<00:16, 2.55MB/s]
      8%|8         | 3.38M/41.5M [00:01<00:09, 4.36MB/s]
     12%|#1        | 4.87M/41.5M [00:01<00:06, 5.58MB/s]
     15%|#5        | 6.36M/41.5M [00:02<00:05, 6.42MB/s]
     19%|#8        | 7.84M/41.5M [00:02<00:05, 7.00MB/s]
     23%|##2       | 9.34M/41.5M [00:02<00:04, 7.40MB/s]
     26%|##6       | 10.8M/41.5M [00:02<00:04, 7.68MB/s]
     30%|##9       | 12.3M/41.5M [00:02<00:03, 7.88MB/s]
     33%|###3      | 13.8M/41.5M [00:02<00:03, 8.01MB/s]
     37%|###6      | 15.3M/41.5M [00:03<00:
 03, 8.11MB/s]
     40%|####      | 16.8M/41.5M [00:03<00:03, 8.17MB/s]
     44%|####4     | 18.3M/41.5M [00:03<00:02, 8.21MB/s]
     48%|####7     | 19.8M/41.5M [00:03<00:02, 8.25MB/s]
     51%|#####1    | 21.2M/41.5M [00:03<00:02, 8.27MB/s]
     55%|#####4    | 22.7M/41.5M [00:04<00:02, 8.28MB/s]
     58%|#####8    | 24.2M/41.5M [00:04<00:02, 8.30MB/s]
     62%|######1   | 25.7M/41.5M [00:04<00:01, 8.30MB/s]
     66%|######5   | 27.2M/41.5M [00:04<00:01, 8.29MB/s]
     69%|######9   | 28.7M/41.5M [00:04<00:01, 8.28MB/s]
     73%|#######2  | 30.1M/41.5M [00:05<00:01, 8.29MB/s]
     75%|#######4  | 31.1M/41.5M [00:05<00:01, 7.26MB/s]
     78%|#######8  | 32.6M/41.5M [00:05<00:01, 7.58MB/s]
     82%|########2 | 34.1M/41.5M [00:05<00:00, 7.81MB/s]
     85%|########5 | 35.4M/41.5M [00:06<00:01, 5.98MB/s]
     89%|########8 | 36.9M/41.5M [00:06<00:00, 6.56MB/s]
     91%|######### | 37.7M/41.5M [00:06<00:00, 5.95MB/s]
     94%|#########4| 39.2M/41.5M [00:06<00:00, 6.59MB/s]
     97%|#####
 ####6| 40.2M/41.5M [00:06<00:00, 6.41MB/s]
     98%|#########8| 40.9M/41.5M [00:06<00:00, 5.56MB/s]
    100%|##########| 41.5M/41.5M [00:06<00:00, 6.26MB/s]
+
      0%|          | 0.00/41.5M [00:00<?, ?B/s]
      0%|          | 16.0k/41.5M [00:00<07:31, 96.3kB/s]
      0%|          | 48.0k/41.5M [00:00<04:45, 152kB/s] 
      0%|          | 96.0k/41.5M [00:00<03:22, 214kB/s]
      0%|          | 168k/41.5M [00:00<02:24, 300kB/s] 
      1%|          | 352k/41.5M [00:00<01:13, 589kB/s]
      1%|1         | 616k/41.5M [00:01<00:46, 926kB/s]
      3%|2         | 1.22M/41.5M [00:01<00:22, 1.86MB/s]
      6%|5         | 2.45M/41.5M [00:01<00:11, 3.66MB/s]
     10%|9         | 3.95M/41.5M [00:01<00:07, 5.39MB/s]
     13%|#3        | 5.44M/41.5M [00:01<00:05, 6.55MB/s]
     17%|#6        | 6.94M/41.5M [00:01<00:04, 7.35MB/s]
     20%|##        | 8.44M/41.5M [00:02<00:04, 7.90MB/s]
     24%|##3       | 9.92M/41.5M [00:02<00:04, 8.27MB/s]
     28%|##7       | 11.4M/41.5M [00:02<00:03, 8.54MB/s]
     31%|###1      | 12.9M/41.5M [00:02<00:03, 8.72MB/s]
     35%|###4      | 14.4M/41.5M [00:02<00:03, 8.82MB/s]
     38%|###8      | 15.9M/41.5M [00:02<00
 :03, 8.92MB/s]
     42%|####1     | 17.4M/41.5M [00:03<00:02, 8.99MB/s]
     45%|####5     | 18.9M/41.5M [00:03<00:02, 9.03MB/s]
     49%|####9     | 20.4M/41.5M [00:03<00:02, 9.05MB/s]
     53%|#####2    | 21.9M/41.5M [00:03<00:02, 9.09MB/s]
     56%|#####6    | 23.3M/41.5M [00:03<00:02, 9.09MB/s]
     60%|#####9    | 24.8M/41.5M [00:03<00:01, 9.10MB/s]
     63%|######3   | 26.3M/41.5M [00:04<00:01, 9.11MB/s]
     67%|######7   | 27.8M/41.5M [00:04<00:01, 9.12MB/s]
     71%|#######   | 29.3M/41.5M [00:04<00:01, 9.11MB/s]
     74%|#######4  | 30.8M/41.5M [00:04<00:01, 9.13MB/s]
     78%|#######7  | 32.3M/41.5M [00:04<00:01, 9.12MB/s]
     81%|########1 | 33.8M/41.5M [00:04<00:00, 9.14MB/s]
     85%|########5 | 35.3M/41.5M [00:05<00:00, 9.12MB/s]
     89%|########8 | 36.8M/41.5M [00:05<00:00, 9.11MB/s]
     92%|#########2| 38.3M/41.5M [00:05<00:00, 9.13MB/s]
     96%|#########5| 39.7M/41.5M [00:05<00:00, 9.12MB/s]
     99%|#########9| 41.2M/41.5M [00:05<00:00, 9.10MB/s]
    100%|####
 ######| 41.5M/41.5M [00:05<00:00, 7.47MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_paddle.rst.txt b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
index ddf816bdf..9afc12fb2 100644
--- a/docs/_sources/how_to/compile_models/from_paddle.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_paddle.rst.txt
@@ -210,7 +210,7 @@ Look up prediction top 1 index in 1000 class synset.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  7.154 seconds)
+   **Total running time of the script:** ( 1 minutes  6.253 seconds)
 
 
 .. _sphx_glr_download_how_to_compile_models_from_paddle.py:
diff --git a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
index c5d78aba6..b330081ca 100644
--- a/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_pytorch.rst.txt
@@ -79,7 +79,7 @@ Load a pretrained PyTorch model
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
-
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
      8%|7         | 3.53M/44.7M [00:00<00:01, 36.7MB/s]
     16%|#5        | 7.04M/44.7M [00:00<00:01, 35.9MB/s]
     41%|####1     | 18.3M/44.7M [00:00<00:00, 72.4MB/s]
     79%|#######8  | 35.1M/44.7M [00:00<00:00, 113MB/s] 
    100%|##########| 44.7M/44.7M [00:00<00:00, 105MB/s]
+
      0%|          | 0.00/44.7M [00:00<?, ?B/s]
      9%|8         | 3.96M/44.7M [00:00<00:01, 41.2MB/s]
     18%|#7        | 7.90M/44.7M [00:00<00:00, 40.5MB/s]
     62%|######1   | 27.7M/44.7M [00:00<00:00, 115MB/s] 
    100%|##########| 44.7M/44.7M [00:00<00:00, 120MB/s]
 
 
 
diff --git a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
index fa7a92713..c58719675 100644
--- a/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
+++ b/docs/_sources/how_to/compile_models/from_tensorflow.rst.txt
@@ -379,6 +379,11 @@ Run the corresponding model on tensorflow
 
 
 
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 1 minutes  6.492 seconds)
+
+
 .. _sphx_glr_download_how_to_compile_models_from_tensorflow.py:
 
 
diff --git a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
index 8f18a141f..58717e3c9 100644
--- a/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/compile_models/sg_execution_times.rst.txt
@@ -5,15 +5,15 @@
 
 Computation times
 =================
-**05:19.617** total execution time for **how_to_compile_models** files:
+**05:55.351** total execution time for **how_to_compile_models** files:
 
-- **01:07.154**: :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)
-- **00:59.786**: :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``)
-- **00:57.244**: :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)
-- **00:32.067**: :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)
-- **00:23.948**: :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)
-- **00:22.611**: :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)
-- **00:20.958**: :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)
-- **00:19.583**: :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)
-- **00:13.801**: :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)
-- **00:02.463**: :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)
+- **01:06.492**: :ref:`sphx_glr_how_to_compile_models_from_tensorflow.py` (``from_tensorflow.py``)
+- **01:06.253**: :ref:`sphx_glr_how_to_compile_models_from_paddle.py` (``from_paddle.py``)
+- **00:57.441**: :ref:`sphx_glr_how_to_compile_models_from_darknet.py` (``from_darknet.py``)
+- **00:38.879**: :ref:`sphx_glr_how_to_compile_models_from_tflite.py` (``from_tflite.py``)
+- **00:31.139**: :ref:`sphx_glr_how_to_compile_models_from_oneflow.py` (``from_oneflow.py``)
+- **00:28.658**: :ref:`sphx_glr_how_to_compile_models_from_keras.py` (``from_keras.py``)
+- **00:22.838**: :ref:`sphx_glr_how_to_compile_models_from_mxnet.py` (``from_mxnet.py``)
+- **00:21.267**: :ref:`sphx_glr_how_to_compile_models_from_coreml.py` (``from_coreml.py``)
+- **00:19.939**: :ref:`sphx_glr_how_to_compile_models_from_pytorch.py` (``from_pytorch.py``)
+- **00:02.444**: :ref:`sphx_glr_how_to_compile_models_from_onnx.py` (``from_onnx.py``)
diff --git a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
index 9344af4a4..d0901ad7a 100644
--- a/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_model_on_android.rst.txt
@@ -402,7 +402,7 @@ Execute on TVM
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      15.5115      15.4647      15.7794      15.4293       0.1051   
+      15.6616      15.5844      15.9253      15.4984       0.1454   
                
 
 
diff --git a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
index 7501ff3b0..92d48b9af 100644
--- a/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_object_detection_pytorch.rst.txt
@@ -108,7 +108,7 @@ Load pre-trained maskrcnn from torchvision and do tracing
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth" to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
-
      0%|          | 0.00/170M [00:00<?, ?B/s]
      2%|1         | 3.00M/170M [00:00<00:05, 31.5MB/s]
      4%|3         | 6.50M/170M [00:00<00:04, 34.5MB/s]
     16%|#6        | 27.8M/170M [00:00<00:01, 120MB/s] 
     25%|##5       | 42.8M/170M [00:00<00:00, 135MB/s]
     40%|####      | 68.1M/170M [00:00<00:00, 182MB/s]
     50%|#####     | 85.4M/170M [00:00<00:00, 167MB/s]
     66%|######6   | 112M/170M [00:00<00:00, 202MB/s] 
     78%|#######7  | 132M/170M [00:00<00:00, 193MB/s]
     89%|########8 | 150M/170M [00:00<00:00, 184MB/s]
    100%|##########| 170M/170M [00:01<00:00, 170MB/s]
+
      0%|          | 0.00/170M [00:00<?, ?B/s]
      2%|1         | 3.33M/170M [00:00<00:05, 34.7MB/s]
      4%|4         | 7.15M/170M [00:00<00:04, 37.8MB/s]
     18%|#8        | 31.2M/170M [00:00<00:01, 135MB/s] 
     34%|###3      | 57.1M/170M [00:00<00:00, 189MB/s]
     49%|####8     | 83.0M/170M [00:00<00:00, 219MB/s]
     64%|######3   | 109M/170M [00:00<00:00, 236MB/s] 
     79%|#######9  | 134M/170M [00:00<00:00, 247MB/s]
     94%|#########3| 160M/170M [00:00<00:00, 253MB/s]
    100%|##########| 170M/170M [00:00<00:00, 212MB/s]
     /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
       for i in range(dim)
     /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
@@ -262,7 +262,7 @@ Get boxes with score larger than 0.9
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  52.190 seconds)
+   **Total running time of the script:** ( 2 minutes  55.555 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_object_detection_pytorch.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
index 82aabacf8..f0322ae32 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized.rst.txt
@@ -187,7 +187,7 @@ training. Other models require a full post training calibration.
  .. code-block:: none
 
     Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
-
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 170MB/s]
+
      0%|          | 0.00/13.6M [00:00<?, ?B/s]
     27%|##7       | 3.69M/13.6M [00:00<00:00, 38.4MB/s]
     54%|#####4    | 7.36M/13.6M [00:00<00:00, 34.1MB/s]
    100%|##########| 13.6M/13.6M [00:00<00:00, 56.7MB/s]
 
 
 
@@ -353,7 +353,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      90.2020      90.1611      91.4664      89.9314       0.2113   
+      90.6288      90.3101      109.3293     90.1389       1.9611   
                
 
 
@@ -393,7 +393,7 @@ TODO
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  5.324 seconds)
+   **Total running time of the script:** ( 1 minutes  7.105 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
index 4d8fe32e1..fe3455434 100644
--- a/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_prequantized_tflite.rst.txt
@@ -360,7 +360,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      117.2626     116.6744     133.9423     115.2772      2.1643   
+      119.3850     119.3810     120.4711     118.5099      0.3716   
                
 
 
@@ -394,7 +394,7 @@ Here we give an example of how to measure performance of TVM compiled models.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  4.665 seconds)
+   **Total running time of the script:** ( 1 minutes  58.792 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_prequantized_tflite.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
index 0c2796b82..fba0b5767 100644
--- a/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_quantized.rst.txt
@@ -223,7 +223,7 @@ We create a Relay VM to build and execute the model.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  28.814 seconds)
+   **Total running time of the script:** ( 1 minutes  27.202 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_quantized.py:
diff --git a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
index 2fc9080f0..4e2390c08 100644
--- a/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
+++ b/docs/_sources/how_to/deploy_models/deploy_ssd_gluoncv.rst.txt
@@ -137,7 +137,7 @@ Convert and compile model for CPU.
             data: None
       input_sym_arg_type = in_param.infer_type()[0]
     Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
-
      0%|          | 0/132723 [00:00<?, ?KB/s]
      5%|4         | 6076/132723 [00:00<00:02, 60755.86KB/s]
     11%|#1        | 14687/132723 [00:00<00:01, 75651.01KB/s]
     18%|#7        | 23341/132723 [00:00<00:01, 80618.32KB/s]
     24%|##4       | 32060/132723 [00:00<00:01, 83209.49KB/s]
     31%|###       | 40689/132723 [00:00<00:01, 84317.29KB/s]
     37%|###7      | 49378/132723 [00:00<00:00, 85188.79KB/s]
     44%|####3     | 57961/132723 [00:00<00:00, 85397.29KB/s]
     50%|#####     | 66554/132723 [00:00<00:00, 85563.32KB/s]
     57%|#####6    | 75244/132723 [00:00<00:00, 85979.77KB/s]
     63%|######3   | 83950/132723 [00:01<00:00, 86310.28KB/s]
     70%|######9   | 92634/132723 [00:01<00:00, 86470.65KB/s]
     76%|#######6  | 101328/132723 [00:01<00:00, 86610.87KB/s]
     83%|########2 | 110062/132723 [00:01<00:00, 86830.63KB/s]
     89%|########9 | 118777/132723 [00:01<00:00, 86923.61KB/s]
     96%|#########6| 127514/132723 [00:01<00:00, 87055.62KB/s]
    100%|#######
 ###| 132723/132723 [00:01<00:00, 85017.75KB/s]
+
      0%|          | 0/132723 [00:00<?, ?KB/s]
      4%|4         | 5835/132723 [00:00<00:02, 58343.66KB/s]
     10%|#         | 13882/132723 [00:00<00:01, 71356.41KB/s]
     17%|#6        | 21949/132723 [00:00<00:01, 75603.29KB/s]
     23%|##2       | 30108/132723 [00:00<00:01, 77964.00KB/s]
     29%|##8       | 37905/132723 [00:00<00:01, 75858.11KB/s]
     35%|###4      | 46068/132723 [00:00<00:01, 77782.99KB/s]
     41%|####      | 54213/132723 [00:00<00:00, 78961.94KB/s]
     47%|####6     | 62282/132723 [00:00<00:00, 79505.79KB/s]
     53%|#####3    | 70354/132723 [00:00<00:00, 79880.70KB/s]
     59%|#####9    | 78485/132723 [00:01<00:00, 80311.83KB/s]
     65%|######5   | 86542/132723 [00:01<00:00, 80388.24KB/s]
     71%|#######1  | 94687/132723 [00:01<00:00, 80709.03KB/s]
     77%|#######7  | 102760/132723 [00:01<00:00, 79536.35KB/s]
     83%|########3 | 110719/132723 [00:01<00:00, 79174.39KB/s]
     89%|########9 | 118640/132723 [00:01<00:00, 73478.34KB/s]
     96%|########
 #5| 127131/132723 [00:01<00:00, 76715.07KB/s]
    100%|##########| 132723/132723 [00:01<00:00, 77785.71KB/s]
 
 
 
@@ -211,7 +211,7 @@ Display result
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  15.000 seconds)
+   **Total running time of the script:** ( 2 minutes  17.308 seconds)
 
 
 .. _sphx_glr_download_how_to_deploy_models_deploy_ssd_gluoncv.py:
diff --git a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
index 966b772a2..611b15644 100644
--- a/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/deploy_models/sg_execution_times.rst.txt
@@ -5,13 +5,13 @@
 
 Computation times
 =================
-**10:35.912** total execution time for **how_to_deploy_models** files:
+**10:36.996** total execution time for **how_to_deploy_models** files:
 
-- **02:52.190**: :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``)
-- **02:14.1000**: :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)
-- **02:04.665**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)
-- **01:28.814**: :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)
-- **01:05.324**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)
-- **00:27.679**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)
-- **00:22.056**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)
-- **00:00.184**: :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)
+- **02:55.555**: :ref:`sphx_glr_how_to_deploy_models_deploy_object_detection_pytorch.py` (``deploy_object_detection_pytorch.py``)
+- **02:17.308**: :ref:`sphx_glr_how_to_deploy_models_deploy_ssd_gluoncv.py` (``deploy_ssd_gluoncv.py``)
+- **01:58.792**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized_tflite.py` (``deploy_prequantized_tflite.py``)
+- **01:27.202**: :ref:`sphx_glr_how_to_deploy_models_deploy_quantized.py` (``deploy_quantized.py``)
+- **01:07.105**: :ref:`sphx_glr_how_to_deploy_models_deploy_prequantized.py` (``deploy_prequantized.py``)
+- **00:28.431**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_android.py` (``deploy_model_on_android.py``)
+- **00:22.401**: :ref:`sphx_glr_how_to_deploy_models_deploy_model_on_rasp.py` (``deploy_model_on_rasp.py``)
+- **00:00.202**: :ref:`sphx_glr_how_to_deploy_models_deploy_sparse.py` (``deploy_sparse.py``)
diff --git a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
index 5438ff172..7e7d66b60 100644
--- a/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/bring_your_own_datatypes.rst.txt
@@ -425,7 +425,7 @@ First let us define two helper functions to get the mobilenet model and a cat im
 
  .. code-block:: none
 
-    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipf0c0bb47-4970-4a0b-b2bc-ac6cf191dd17 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+    Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip6607e628-6104-4e6e-bc94-96a761f538ed from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 
 
 
diff --git a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
index 4c193d175..8693572d5 100644
--- a/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/sg_execution_times.rst.txt
@@ -5,9 +5,9 @@
 
 Computation times
 =================
-**00:38.104** total execution time for **how_to_extend_tvm** files:
+**00:39.683** total execution time for **how_to_extend_tvm** files:
 
-- **00:34.660**: :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``)
-- **00:02.226**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)
-- **00:01.033**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)
-- **00:00.185**: :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)
+- **00:35.811**: :ref:`sphx_glr_how_to_extend_tvm_bring_your_own_datatypes.py` (``bring_your_own_datatypes.py``)
+- **00:02.338**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_instrument.py` (``use_pass_instrument.py``)
+- **00:01.323**: :ref:`sphx_glr_how_to_extend_tvm_use_pass_infra.py` (``use_pass_infra.py``)
+- **00:00.211**: :ref:`sphx_glr_how_to_extend_tvm_low_level_custom_pass.py` (``low_level_custom_pass.py``)
diff --git a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
index dba9e5ab9..5845164ec 100644
--- a/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
+++ b/docs/_sources/how_to/extend_tvm/use_pass_instrument.rst.txt
@@ -199,10 +199,10 @@ profile the execution time of each passes.
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 6080us [6080us] (45.41%; 45.41%)
-    FoldScaleAxis: 7310us [5us] (54.59%; 54.59%)
-            FoldConstant: 7305us [1471us] (54.56%; 99.93%)
-                    InferType: 5834us [5834us] (43.57%; 79.86%)
+    InferType: 6585us [6585us] (45.67%; 45.67%)
+    FoldScaleAxis: 7835us [5us] (54.33%; 54.33%)
+            FoldConstant: 7829us [1572us] (54.29%; 99.93%)
+                    InferType: 6257us [6257us] (43.39%; 79.93%)
 
 
 
@@ -239,10 +239,10 @@ Refer to following sections and :py:func:`tvm.instrument.pass_instrument` for th
  .. code-block:: none
 
     Printing results of timing profile...
-    InferType: 5872us [5872us] (44.48%; 44.48%)
-    FoldScaleAxis: 7329us [4us] (55.52%; 55.52%)
-            FoldConstant: 7325us [1552us] (55.49%; 99.94%)
-                    InferType: 5773us [5773us] (43.73%; 78.82%)
+    InferType: 6258us [6258us] (44.54%; 44.54%)
+    FoldScaleAxis: 7794us [4us] (55.46%; 55.46%)
+            FoldConstant: 7789us [1607us] (55.43%; 99.94%)
+                    InferType: 6183us [6183us] (44.00%; 79.37%)
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
index 384670374..fd8c403ea 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_cuda.rst.txt
@@ -295,7 +295,7 @@ latency of convolution.
 
  .. code-block:: none
 
-    Convolution: 35.924908 ms
+    Convolution: 54.171603 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
index 01e7849e1..b53b2c316 100644
--- a/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_conv_tensorcore.rst.txt
@@ -628,7 +628,7 @@ be able to run on our build server
 
  .. code-block:: none
 
-    conv2d with tensor core: 13.361755 ms
+    conv2d with tensor core: 8.954840 ms
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
index 88a72c852..5d8fe4588 100644
--- a/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/opt_gemm.rst.txt
@@ -118,8 +118,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 
  .. code-block:: none
 
-    Numpy running time: 0.017828
-    Baseline: 3.377547
+    Numpy running time: 0.018505
+    Baseline: 3.443746
 
 
 
@@ -210,7 +210,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 
  .. code-block:: none
 
-    Opt1: 0.296004
+    Opt1: 0.306762
 
 
 
@@ -309,7 +309,7 @@ In this tutorial, we chose to vectorize the inner loop row data since it is cach
 
  .. code-block:: none
 
-    Opt2: 0.329275
+    Opt2: 0.332624
 
 
 
@@ -401,7 +401,7 @@ the access pattern for A matrix is more cache friendly.
 
  .. code-block:: none
 
-    Opt3: 0.120399
+    Opt3: 0.116654
 
 
 
@@ -520,7 +520,7 @@ flattening.
 
  .. code-block:: none
 
-    Opt4: 0.110560
+    Opt4: 0.110416
 
 
 
@@ -638,7 +638,7 @@ write to C when all the block results are ready.
 
  .. code-block:: none
 
-    Opt5: 0.110722
+    Opt5: 0.111222
 
 
 
@@ -759,7 +759,7 @@ Futhermore, we can also utilize multi-core processors to do the thread-level par
 
  .. code-block:: none
 
-    Opt6: 0.145105
+    Opt6: 0.145492
 
 
 
diff --git a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
index 55f6c528f..2273680c5 100644
--- a/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/optimize_operators/sg_execution_times.rst.txt
@@ -5,8 +5,8 @@
 
 Computation times
 =================
-**00:34.964** total execution time for **how_to_optimize_operators** files:
+**00:35.245** total execution time for **how_to_optimize_operators** files:
 
-- **00:32.110**: :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)
-- **00:01.615**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``)
-- **00:01.238**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)
+- **00:32.570**: :ref:`sphx_glr_how_to_optimize_operators_opt_gemm.py` (``opt_gemm.py``)
+- **00:01.459**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_tensorcore.py` (``opt_conv_tensorcore.py``)
+- **00:01.216**: :ref:`sphx_glr_how_to_optimize_operators_opt_conv_cuda.py` (``opt_conv_cuda.py``)
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
index 08a6366c4..5b5d793e6 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/sg_execution_times.rst.txt
@@ -5,11 +5,11 @@
 
 Computation times
 =================
-**05:16.371** total execution time for **how_to_tune_with_autoscheduler** files:
-
-- **02:37.623**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``)
-- **01:19.340**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)
-- **00:42.014**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)
-- **00:19.946**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)
-- **00:09.090**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)
-- **00:08.359**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)
+**05:16.489** total execution time for **how_to_tune_with_autoscheduler** files:
+
+- **02:38.683**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py` (``tune_conv2d_layer_cuda.py``)
+- **01:20.354**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_x86.py` (``tune_network_x86.py``)
+- **00:42.530**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_cuda.py` (``tune_network_cuda.py``)
+- **00:17.587**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_sparse_x86.py` (``tune_sparse_x86.py``)
+- **00:08.860**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_mali.py` (``tune_network_mali.py``)
+- **00:08.475**: :ref:`sphx_glr_how_to_tune_with_autoscheduler_tune_network_arm.py` (``tune_network_arm.py``)
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
index 8de53456d..0f8269eea 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.rst.txt
@@ -751,7 +751,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 0.370 ms
+    Execution time of this operator: 0.350 ms
 
 
 
@@ -1351,7 +1351,7 @@ In the example below we resume the status and do more 5 trials.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 2 minutes  37.623 seconds)
+   **Total running time of the script:** ( 2 minutes  38.683 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_conv2d_layer_cuda.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
index ee8b842d1..f4a75f9ec 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_cuda.rst.txt
@@ -616,7 +616,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      10.0358      10.0072      10.1016       9.9986       0.0467   
+       9.6087       9.6029       9.6330       9.5902       0.0179   
                
 
 
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
index 88f7df46d..69fd1e03e 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_network_x86.rst.txt
@@ -635,7 +635,7 @@ so we can read the log file and load the best schedules.
     Evaluate inference time cost...
     Execution time summary:
      mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
-      753.0981     753.2482     753.2847     752.7613      0.2386   
+      758.6266     758.8503     761.0661     755.9634      2.0892   
                
 
 
@@ -660,7 +660,7 @@ Other Tips
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 1 minutes  19.340 seconds)
+   **Total running time of the script:** ( 1 minutes  20.354 seconds)
 
 
 .. _sphx_glr_download_how_to_tune_with_autoscheduler_tune_network_x86.py:
diff --git a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
index ab936939b..857a182ea 100644
--- a/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
+++ b/docs/_sources/how_to/tune_with_autoscheduler/tune_sparse_x86.rst.txt
@@ -362,7 +362,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
                  placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
                  compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
       buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-      preflattened_buffer_map = {placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_7: placeholder_19: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], [])} {
+      preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_7: placeholder_19: Buffer(placeholder_12, int32, [4916], [])} {
       for (i0.outer.i1.outer.fused: int32, 0, 32) "parallel" {
         allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global {
           for (nb_j.inner: int32, 0, 2) {
@@ -485,7 +485,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 1.847 ms
+    Execution time of this operator: 1.838 ms
 
 
 
diff --git a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
index b605ad576..6bb44541d 100644
--- a/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/sg_execution_times.rst.txt
@@ -5,10 +5,10 @@
 
 Computation times
 =================
-**00:44.445** total execution time for **how_to_tune_with_autotvm** files:
+**00:44.175** total execution time for **how_to_tune_with_autotvm** files:
 
-- **00:43.616**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)
-- **00:00.213**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``)
-- **00:00.213**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)
-- **00:00.203**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)
-- **00:00.199**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)
+- **00:43.281**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_conv2d_cuda.py` (``tune_conv2d_cuda.py``)
+- **00:00.232**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_x86.py` (``tune_relay_x86.py``)
+- **00:00.228**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_cuda.py` (``tune_relay_cuda.py``)
+- **00:00.218**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_arm.py` (``tune_relay_arm.py``)
+- **00:00.217**: :ref:`sphx_glr_how_to_tune_with_autotvm_tune_relay_mobile_gpu.py` (``tune_relay_mobile_gpu.py``)
diff --git a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
index ad6a52b84..a99ada31e 100644
--- a/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
+++ b/docs/_sources/how_to/tune_with_autotvm/tune_conv2d_cuda.rst.txt
@@ -859,8 +859,8 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 4, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2885496
-    No: 6   GFLOPS: 97.01/97.01     result: MeasureResult(costs=(0.002386400625,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.612271785736084, timestamp=1654294869.017783)       [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
-    No: 7   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 6   GFLOPS: 42.39/42.39     result: MeasureResult(costs=(0.005461075315789474,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.610039234161377, timestamp=1654298603.3978002)        [('tile_f', [-1, 1, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,3754080
+    No: 7   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -983,7 +983,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 256, 1]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6225319
-    No: 8   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 8   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1106,7 +1106,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 32]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 8, 64]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,943546
-    No: 9   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 9   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1229,7 +1229,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 4, 16, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16, 32]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2868708
-    No: 10  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 10  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 142, in build
         res = future.result()
       File "/usr/lib/python3.7/concurrent/futures/_base.py", line 435, in result
@@ -1247,7 +1247,7 @@ for this template
     TimeoutError
 
             [('tile_f', [-1, 32, 2, 4]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 4, 2]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4691833
-    No: 11  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 11  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1370,7 +1370,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 2, 64]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,1042124
-    No: 12  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 12  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1493,7 +1493,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 32, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32, 16]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,10013405
-    No: 13  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 13  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1616,7 +1616,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 8, 8, 2]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 4, 32]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6732082
-    No: 14  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 14  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1739,7 +1739,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 4, 32]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 4, 128]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)],None,7536735
-    No: 15  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 15  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1862,7 +1862,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 4]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 128, 4]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)],None,482121
-    No: 16  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 16  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -1985,7 +1985,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 32, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)],None,2824525
-    No: 17  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 17  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2108,7 +2108,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 64, 1, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 8, 8]), ('tile_ry', [-1, 1, 3]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)],None,4559286
-    No: 18  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 18  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 571, in __call__
         func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 523, in _build_func_common
@@ -2231,7 +2231,7 @@ for this template
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 854, in verify_pass
         raise InstantiationError("Skipped because of invalid gpu kernel")
     tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [('tile_f', [-1, 1, 32, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 512]), ('tile_ry', [-1, 3, 1]), ('tile_rx', [-1, 3, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9677544
-    No: 19  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+    No: 19  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 721, in __call__
         yield remote, remote.load_module(os.path.split(build_result.filename)[1])
       File "/workspace/python/tvm/autotvm/measure/measure_methods.py", line 685, in run_through_rpc
@@ -2319,7 +2319,7 @@ for this template
       15: _PyEval_EvalFrameDefault
       14: 0x0000000000537c30
       13: _PyObject_FastCallKeywords
-      12: 0x00007f602848afa2
+      12: 0x00007f37cee58fa2
       11: _ctypes_callproc
       10: ffi_call
       9: ffi_call_unix64
@@ -2384,7 +2384,7 @@ for this template
       21: _PyFunction_FastCallKeywords
       20: _PyEval_EvalFrameDefault
       19: _PyFunction_FastCall      [('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 1, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)],None,6390073
-    No: 20  GFLOPS: 142.13/142.13   result: MeasureResult(costs=(0.0016287441699999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4211044311523438, timestamp=1654294895.6017895)      [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
+    No: 20  GFLOPS: 145.03/145.03   result: MeasureResult(costs=(0.0015962098,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4588935375213623, timestamp=1654298629.9176986)       [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
 
 
 
@@ -2437,7 +2437,7 @@ and measure running time.
 
     Best config:
     [('tile_f', [-1, 1, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 4, 1]), ('tile_ry', [-1, 1, 1]), ('tile_rx', [-1, 1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)],None,9881539
-    Time cost of this operator: 0.002030
+    Time cost of this operator: 0.002019
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/index.rst.txt b/docs/_sources/how_to/work_with_microtvm/index.rst.txt
index 8ca84fd6c..29595f4fe 100644
--- a/docs/_sources/how_to/work_with_microtvm/index.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/index.rst.txt
@@ -94,6 +94,26 @@ demonstrate how to tune and deploy models with microTVM.
 
    /how_to/work_with_microtvm/micro_tflite
 
+.. raw:: html
+
+    <div class="sphx-glr-thumbcontainer" tooltip="This tutorial shows how MobileNetV1 models can be trained to fit on embedded devices, and how t...">
+
+.. only:: html
+
+    .. figure:: /how_to/work_with_microtvm/images/thumb/sphx_glr_micro_train_thumb.png
+
+        :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py`
+
+.. raw:: html
+
+    </div>
+
+
+.. toctree::
+   :hidden:
+
+   /how_to/work_with_microtvm/micro_train
+
 .. raw:: html
 
     <div class="sphx-glr-thumbcontainer" tooltip="This tutorial explains how to compile a tiny model for a micro device, build a program on Zephy...">
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
index df68a1a3e..a2f495644 100644
--- a/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/micro_autotune.rst.txt
@@ -294,10 +294,10 @@ Timing the untuned program
     ########## Build without Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  
     ---------                                     ---                                           --------  -------  -----              ------  -------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  313.4     98.748   (1, 2, 10, 10, 3)  2       1        
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.042     0.959    (1, 6, 10, 10)     1       1        
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.93      0.293    (1, 1, 10, 10, 3)  1       1        
-    Total_time                                    -                                             317.372   -        -                  -       -        
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  314.7     98.745   (1, 2, 10, 10, 3)  2       1        
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.076     0.965    (1, 6, 10, 10)     1       1        
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.924     0.29     (1, 1, 10, 10, 3)  1       1        
+    Total_time                                    -                                             318.7     -        -                  -       -        
 
 
 
@@ -359,10 +359,10 @@ Timing the tuned program
     ########## Build with Autotuning ##########
     Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs  
     ---------                                     ---                                           --------  -------  -----              ------  -------  
-    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  151.6     98.264   (1, 6, 10, 10, 1)  2       1        
-    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.74      1.128    (1, 6, 10, 10)     1       1        
-    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.939     0.609    (1, 1, 10, 10, 3)  1       1        
-    Total_time                                    -                                             154.279   -        -                  -       -        
+    tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  133.0     97.979   (1, 6, 10, 10, 1)  2       1        
+    tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.838     1.354    (1, 6, 10, 10)     1       1        
+    tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.905     0.667    (1, 1, 10, 10, 3)  1       1        
+    Total_time                                    -                                             135.744   -        -                  -       -        
 
 
 
diff --git a/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
new file mode 100644
index 000000000..1843ec8cb
--- /dev/null
+++ b/docs/_sources/how_to/work_with_microtvm/micro_train.rst.txt
@@ -0,0 +1,856 @@
+.. note::
+    :class: sphx-glr-download-link-note
+
+    Click :ref:`here <sphx_glr_download_how_to_work_with_microtvm_micro_train.py>` to download the full example code
+.. rst-class:: sphx-glr-example-title
+
+.. _sphx_glr_how_to_work_with_microtvm_micro_train.py:
+
+
+.. _microtvm-train-arduino:
+
+Training Vision Models for microTVM on Arduino
+==============================================
+**Author**: `Gavin Uberti <https://github.com/guberti>`_
+
+This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.
+
+.. note::
+
+  This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally
+  using the link at the bottom of this page, or open it online for free using Google Colab.
+  Click the icon below to open in Google Colab.
+
+.. image:: https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png
+     :align: center
+     :target: https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb
+     :width: 300px
+
+Motivation
+----------
+When building IOT devices, we often want them to **see and understand** the world around them.
+This can take many forms, but often times a device will want to know if a certain **kind of
+object** is in its field of vision.
+
+For example, a security camera might look for **people**, so it can decide whether to save a video
+to memory. A traffic light might look for **cars**, so it can judge which lights should change
+first. Or a forest camera might look for a **kind of animal**, so they can estimate how large
+the animal population is.
+
+To make these devices affordable, we would like them to need only a low-cost processor like the
+`nRF52840 <https://www.nordicsemi.com/Products/nRF52840>`_ (costing five dollars each on Mouser) or the `RP2040 <https://www.raspberrypi.com/products/rp2040/>`_ (just $1.45 each!).
+
+These devices have very little memory (~250 KB RAM), meaning that no conventional edge AI
+vision model (like MobileNet or EfficientNet) will be able to run. In this tutorial, we will
+show how these models can be modified to work around this requirement. Then, we will use TVM
+to compile and deploy it for an Arduino that uses one of these processors.
+
+Installing the Prerequisites
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This tutorial will use TensorFlow to train the model - a widely used machine learning library
+created by Google. TensorFlow is a very low-level library, however, so we will the Keras
+interface to talk to TensorFlow. We will also use TensorFlow Lite to perform quantization on
+our model, as TensorFlow by itself does not support this.
+
+Once we have our generated model, we will use TVM to compile and test it. To avoid having to
+build from source, we'll install ``tlcpack`` - a community build of TVM. Lastly, we'll also
+install ``imagemagick`` and ``curl`` to preprocess data:
+
+    .. code-block:: bash
+
+      %%bash
+      pip install -q tensorflow tflite
+      pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels
+      apt-get -qq install imagemagick curl
+
+      # Install Arduino CLI and library for Nano 33 BLE
+      curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh | sh
+      /content/bin/arduino-cli core update-index
+      /content/bin/arduino-cli core install arduino:mbed_nano
+
+Using the GPU
+^^^^^^^^^^^^^
+
+This tutorial demonstrates training a neural network, which is requires a lot of computing power
+and will go much faster if you have a GPU. If you are viewing this tutorial on Google Colab, you
+can enable a GPU by going to **Runtime->Change runtime type** and selecting "GPU" as our hardware
+accelerator. If you are running locally, you can `follow TensorFlow's guide <https://www.tensorflow.org/guide/gpu>`_ instead.
+
+We can test our GPU installation with the following code:
+
+
+.. code-block:: default
+
+
+    import tensorflow as tf
+
+    if not tf.test.gpu_device_name():
+        print("No GPU was detected!")
+        print("Model training will take much longer (~30 minutes instead of ~5)")
+    else:
+        print("GPU detected - you're good to go.")
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    No GPU was detected!
+    Model training will take much longer (~30 minutes instead of ~5)
+
+
+
+Choosing Our Work Dir
+^^^^^^^^^^^^^^^^^^^^^
+We need to pick a directory where our image datasets, trained model, and eventual Arduino sketch
+will all live. If running on Google Colab, we'll save everything in ``/root`` (aka ``~``) but you'll
+probably want to store it elsewhere if running locally. Note that this variable only affects Python
+scripts - you'll have to adjust the Bash commands too.
+
+
+.. code-block:: default
+
+
+    import os
+
+    FOLDER = "/root"
+    # sphinx_gallery_start_ignore
+    import tempfile
+
+    FOLDER = tempfile.mkdtemp()
+    # sphinx_gallery_end_ignore
+
+
+
+
+
+
+
+Downloading the Data
+--------------------
+Convolutional neural networks usually learn by looking at many images, along with labels telling
+the network what those images are. To get these images, we'll need a publicly available dataset
+with thousands of images of all sorts of objects and labels of what's in each image. We'll also
+need a bunch of images that **aren't** of cars, as we're trying to distinguish these two classes.
+
+In this tutorial, we'll create a model to detect if an image contains a **car**, but you can use
+whatever category you like! Just change the source URL below to one containing images of another
+type of object.
+
+To get our car images, we'll be downloading the `Stanford Cars dataset <http://ai.stanford.edu/~jkrause/cars/car_dataset.html>`_,
+which contains 16,185 full color images of cars. We'll also need images of random things that
+aren't cars, so we'll use the `COCO 2017 <https://cocodataset.org/#home>`_ validation set (it's
+smaller, and thus faster to download than the full training set. Training on the full data set
+would yield better results). Note that there are some cars in the COCO 2017 data set, but it's
+a small enough fraction not to matter - just keep in mind that this will drive down our percieved
+accuracy slightly.
+
+We could use the TensorFlow dataloader utilities, but we'll instead do it manually to make sure
+it's easy to change the datasets being used. We'll end up with the following file hierarchy:
+
+    .. code-block::
+
+        /root
+        ├── images
+        │   ├── object
+        │   │   ├── 000001.jpg
+        │   │   │ ...
+        │   │   └── 016185.jpg
+        │   ├── object.tgz
+        │   ├── random
+        │   │   ├── 000000000139.jpg
+        │   │   │ ...
+        │   │   └── 000000581781.jpg
+        │   └── random.zip
+
+We should also note that Stanford cars has 8k images, while the COCO 2017 validation set is 5k
+images - it is not a 50/50 split! If we wanted to, we could weight these classes differently
+during training to correct for this, but training will still work if we ignore it. It should
+take about **2 minutes** to download the Stanford Cars, while COCO 2017 validation will take
+**1 minute**.
+
+
+.. code-block:: default
+
+
+    import os
+    import shutil
+    import urllib.request
+
+    # Download datasets
+    os.makedirs(f"{FOLDER}/images")
+    urllib.request.urlretrieve(
+        "http://ai.stanford.edu/~jkrause/car196/cars_train.tgz", f"{FOLDER}/images/target.tgz"
+    )
+    urllib.request.urlretrieve(
+        "http://images.cocodataset.org/zips/val2017.zip", f"{FOLDER}/images/random.zip"
+    )
+
+    # Extract them and rename their folders
+    shutil.unpack_archive(f"{FOLDER}/images/target.tgz", f"{FOLDER}/images")
+    shutil.unpack_archive(f"{FOLDER}/images/random.zip", f"{FOLDER}/images")
+    shutil.move(f"{FOLDER}/images/cars_train", f"{FOLDER}/images/target")
+    shutil.move(f"{FOLDER}/images/val2017", f"{FOLDER}/images/random")
+
+
+
+
+
+
+
+Loading the Data
+----------------
+Currently, our data is stored on-disk as JPG files of various sizes. To train with it, we'll have
+to load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed
+data. Keras's ``image_dataset_from_directory`` will take care of most of this, though it loads
+images such that each pixel value is a float from 0 to 255.
+
+We'll also need to load labels, though Keras will help with this. From our subdirectory structure,
+it knows the images in ``/objects`` are one class, and those in ``/random`` another. Setting
+``label_mode='categorical'`` tells Keras to convert these into **categorical labels** - a 2x1 vector
+that's either ``[1, 0]`` for an object of our target class, or ``[0, 1]`` vector for anything else.
+We'll also set ``shuffle=True`` to randomize the order of our examples.
+
+We will also **batch** the data - grouping samples into clumps to make our training go faster.
+Setting ``batch_size = 32`` is a decent number.
+
+Lastly, in machine learning we generally want our inputs to be small numbers. We'll thus use a
+``Rescaling`` layer to change our images such that each pixel is a float between ``0.0`` and ``1.0``,
+instead of ``0`` to ``255``. We need to be careful not to rescale our categorical labels though, so
+we'll use a ``lambda`` function.
+
+
+.. code-block:: default
+
+
+    IMAGE_SIZE = (64, 64, 3)
+    unscaled_dataset = tf.keras.utils.image_dataset_from_directory(
+        f"{FOLDER}/images",
+        batch_size=32,
+        shuffle=True,
+        label_mode="categorical",
+        image_size=IMAGE_SIZE[0:2],
+    )
+    rescale = tf.keras.layers.Rescaling(scale=1.0 / 255)
+    full_dataset = unscaled_dataset.map(lambda im, lbl: (rescale(im), lbl))
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Found 13144 files belonging to 2 classes.
+
+
+
+What's Inside Our Dataset?
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Before giving this data set to our neural network, we ought to give it a quick visual inspection.
+Does the data look properly transformed? Do the labels seem appropriate? And what's our ratio of
+objects to other stuff? We can display some examples from our datasets using ``matplotlib``:
+
+
+.. code-block:: default
+
+
+    import matplotlib.pyplot as plt
+
+    num_target_class = len(os.listdir(f"{FOLDER}/images/target/"))
+    num_random_class = len(os.listdir(f"{FOLDER}/images/random/"))
+    print(f"{FOLDER}/images/target contains {num_target_class} images")
+    print(f"{FOLDER}/images/random contains {num_random_class} images")
+
+    # Show some samples and their labels
+    SAMPLES_TO_SHOW = 10
+    plt.figure(figsize=(20, 10))
+    for i, (image, label) in enumerate(unscaled_dataset.unbatch()):
+        if i >= SAMPLES_TO_SHOW:
+            break
+        ax = plt.subplot(1, SAMPLES_TO_SHOW, i + 1)
+        plt.imshow(image.numpy().astype("uint8"))
+        plt.title(list(label.numpy()))
+        plt.axis("off")
+
+
+
+
+.. image:: /how_to/work_with_microtvm/images/sphx_glr_micro_train_001.png
+    :class: sphx-glr-single-img
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    /tmp/tmpb2g6f7s3/images/target contains 8144 images
+    /tmp/tmpb2g6f7s3/images/random contains 5000 images
+
+
+
+Validating our Accuracy
+^^^^^^^^^^^^^^^^^^^^^^^
+While developing our model, we'll often want to check how accurate it is (e.g. to see if it
+improves during training). How do we do this? We could just train it on *all* of the data, and
+then ask it to classify that same data. However, our model could cheat by just memorizing all of
+the samples, which would make it *appear* to have very high accuracy, but perform very badly in
+reality. In practice, this "memorizing" is called **overfitting**.
+
+To prevent this, we will set aside some of the data (we'll use 20%) as a **validation set**. Our
+model will never be trained on validation data - we'll only use it to check our model's accuracy.
+
+
+.. code-block:: default
+
+
+    num_batches = len(full_dataset)
+    train_dataset = full_dataset.take(int(num_batches * 0.8))
+    validation_dataset = full_dataset.skip(len(train_dataset))
+
+
+
+
+
+
+
+Loading the Data
+----------------
+In the past decade, `convolutional neural networks <https://en.wikipedia.org/wiki/Convolutional_neural_network>`_ have been widely
+adopted for image classification tasks. State-of-the-art models like `EfficientNet V2 <https://arxiv.org/abs/2104.00298>`_ are able
+to perform image classification better than even humans! Unfortunately, these models have tens of
+millions of parameters, and thus won't fit on cheap security camera computers.
+
+Our applications generally don't need perfect accuracy - 90% is good enough. We can thus use the
+older and smaller MobileNet V1 architecture. But this *still* won't be small enough - by default,
+MobileNet V1 with 224x224 inputs and alpha 1.0 takes ~50 MB to just **store**. To reduce the size
+of the model, there are three knobs we can turn. First, we can reduce the size of the input images
+from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can also reduce the **alpha**
+of the model, from 1.0 to 0.25, which downscales the width of the network (and the number of
+filters) by a factor of four. And if we were really strapped for space, we could reduce the
+number of **channels** by making our model take grayscale images instead of RGB ones.
+
+In this tutorial, we will use an RGB 64x64 input image and alpha 0.25. This is not quite
+ideal, but it allows the finished model to fit in 192 KB of RAM, while still letting us perform
+transfer learning using the official TensorFlow source models (if we used alpha <0.25 or a
+grayscale input, we wouldn't be able to do this).
+
+What is Transfer Learning?
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Deep learning has `dominated image classification <https://paperswithcode.com/sota/image-classification-on-imagenet>`_ for a long time,
+but training neural networks takes a lot of time. When a neural network is trained "from scratch",
+its parameters start out randomly initialized, forcing it to learn very slowly how to tell images
+apart.
+
+With transfer learning, we instead start with a neural network that's **already** good at a
+specific task. In this example, that task is classifying images from `the ImageNet database <https://www.image-net.org/>`_. This
+means the network already has some object detection capabilities, and is likely closer to what you
+want then a random model would be.
+
+This works especially well with image processing neural networks like MobileNet. In practice, it
+turns out the convolutional layers of the model (i.e. the first 90% of the layers) are used for
+identifying low-level features like lines and shapes - only the last few fully connected layers
+are used to determine how those shapes make up the objects the network is trying to detect.
+
+We can take advantage of this by starting training with a MobileNet model that was trained on
+ImageNet, and already knows how to identify those lines and shapes. We can then just remove the
+last few layers from this pretrained model, and add our own final layers. We'll then train this
+conglomerate model for a few epochs on our cars vs non-cars dataset, to adjust the first layers
+and train from scratch the last layers. This process of training an already-partially-trained
+model is called *fine-tuning*.
+
+Source MobileNets for transfer learning have been `pretrained by the TensorFlow folks <https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md>`_, so we
+can just download the one closest to what we want (the 128x128 input model with 0.25 depth scale).
+
+
+.. code-block:: default
+
+
+    os.makedirs(f"{FOLDER}/models")
+    WEIGHTS_PATH = f"{FOLDER}/models/mobilenet_2_5_128_tf.h5"
+    urllib.request.urlretrieve(
+        "https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5",
+        WEIGHTS_PATH,
+    )
+
+    pretrained = tf.keras.applications.MobileNet(
+        input_shape=IMAGE_SIZE, weights=WEIGHTS_PATH, alpha=0.25
+    )
+
+
+
+
+
+
+
+Modifying Our Network
+^^^^^^^^^^^^^^^^^^^^^
+As mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories,
+but we want to convert it to classify cars. Since only the bottom few layers are task-specific,
+we'll **cut off the last five layers** of our original model. In their place we'll build our own
+"tail" to the model by performing respape, dropout, flatten, and softmax operations.
+
+
+.. code-block:: default
+
+
+    model = tf.keras.models.Sequential()
+
+    model.add(tf.keras.layers.InputLayer(input_shape=IMAGE_SIZE))
+    model.add(tf.keras.Model(inputs=pretrained.inputs, outputs=pretrained.layers[-5].output))
+
+    model.add(tf.keras.layers.Reshape((-1,)))
+    model.add(tf.keras.layers.Dropout(0.1))
+    model.add(tf.keras.layers.Flatten())
+    model.add(tf.keras.layers.Dense(2, activation="softmax"))
+
+
+
+
+
+
+
+Fine Tuning Our Network
+^^^^^^^^^^^^^^^^^^^^^^^
+When training neural networks, we must set a parameter called the **learning rate** that controls
+how fast our network learns. It must be set carefully - too slow, and our network will take
+forever to train; too fast, and our network won't be able to learn some fine details. Generally
+for Adam (the optimizer we're using), ``0.001`` is a pretty good learning rate (and is what's
+recommended in the `original paper <https://arxiv.org/abs/1412.6980>`_). However, in this case
+``0.0005`` seems to work a little better.
+
+We'll also pass the validation set from earlier to ``model.fit``. This will evaluate how good our
+model is each time we train it, and let us track how our model is improving. Once training is
+finished, the model should have a validation accuracy around ``0.98`` (meaning it was right 98% of
+the time on our validation set).
+
+
+.. code-block:: default
+
+
+    model.compile(
+        optimizer=tf.keras.optimizers.Adam(learning_rate=0.0005),
+        loss="categorical_crossentropy",
+        metrics=["accuracy"],
+    )
+    model.fit(train_dataset, validation_data=validation_dataset, epochs=3, verbose=2)
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Epoch 1/3
+    328/328 - 54s - loss: 0.2303 - accuracy: 0.9204 - val_loss: 0.1384 - val_accuracy: 0.9577
+    Epoch 2/3
+    328/328 - 52s - loss: 0.0972 - accuracy: 0.9643 - val_loss: 0.1196 - val_accuracy: 0.9641
+    Epoch 3/3
+    328/328 - 52s - loss: 0.0668 - accuracy: 0.9751 - val_loss: 0.1581 - val_accuracy: 0.9520
+
+
+
+Quantization
+------------
+We've done a decent job of reducing our model's size so far - changing the input dimension,
+along with removing the bottom layers reduced the model to just 219k parameters. However, each of
+these parameters is a ``float32`` that takes four bytes, so our model will take up almost one MB!
+
+Additionally, it might be the case that our hardware doesn't have built-in support for floating
+point numbers. While most high-memory Arduinos (like the Nano 33 BLE) do have hardware support,
+some others (like the Arduino Due) do not. On any boards *without* dedicated hardware support,
+floating point multiplication will be extremely slow.
+
+To address both issues we will **quantize** the model - representing the weights as eight bit
+integers. It's more complex than just rounding, though - to get the best performance, TensorFlow
+tracks how each neuron in our model activates, so we can figure out how most accurately simulate
+the neuron's original activations with integer operations.
+
+We will help TensorFlow do this by creating a representative dataset - a subset of the original
+that is used for tracking how those neurons activate. We'll then pass this into a ``TFLiteConverter``
+(Keras itself does not have quantization support) with an ``Optimize`` flag to tell TFLite to perform
+the conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must
+explicitly tell it to avoid this behavior.
+
+
+.. code-block:: default
+
+
+
+    def representative_dataset():
+        for image_batch, label_batch in full_dataset.take(10):
+            yield [image_batch]
+
+
+    converter = tf.lite.TFLiteConverter.from_keras_model(model)
+    converter.optimizations = [tf.lite.Optimize.DEFAULT]
+    converter.representative_dataset = representative_dataset
+    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
+    converter.inference_input_type = tf.uint8
+    converter.inference_output_type = tf.uint8
+
+    quantized_model = converter.convert()
+
+
+
+
+
+
+
+Download the Model if Desired
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+We've now got a finished model that you can use locally or in other tutorials (try autotuning
+this model or viewing it on `https://netron.app/ <https://netron.app/>`_). But before we do
+those things, we'll have to write it to a file (``quantized.tflite``). If you're running this
+tutorial on Google Colab, you'll have to uncomment the last two lines to download the file
+after writing it.
+
+
+.. code-block:: default
+
+
+    QUANTIZED_MODEL_PATH = f"{FOLDER}/models/quantized.tflite"
+    with open(QUANTIZED_MODEL_PATH, "wb") as f:
+        f.write(quantized_model)
+    # from google.colab import files
+    # files.download(QUANTIZED_MODEL_PATH)
+
+
+
+
+
+
+
+Compiling With TVM For Arduino
+------------------------------
+TensorFlow has a built-in framework for deploying to microcontrollers - `TFLite Micro <https://www.tensorflow.org/lite/microcontrollers>`_. However,
+it's poorly supported by development boards and does not support autotuning. We will use Apache
+TVM instead.
+
+TVM can be used either with its command line interface (``tvmc``) or with its Python interface. The
+Python interface is fully-featured and more stable, so we'll use it here.
+
+TVM is an optimizing compiler, and optimizations to our model are performed in stages via
+**intermediate representations**. The first of these is `Relay <https://arxiv.org/abs/1810.00952>`_ a high-level intermediate
+representation emphasizing portability. The conversion from ``.tflite`` to Relay is done without any
+knowledge of our "end goal" - the fact we intend to run this model on an Arduino.
+
+Choosing an Arduino Board
+^^^^^^^^^^^^^^^^^^^^^^^^^
+Next, we'll have to decide exactly which Arduino board to use. The Arduino sketch that we
+ultimately generate should be compatible with any board, but knowing which board we are using in
+advance allows TVM to adjust its compilation strategy to get better performance.
+
+There is one catch - we need enough **memory** (flash and RAM) to be able to run our model. We
+won't ever be able to run a complex vision model like a MobileNet on an Arduino Uno - that board
+only has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, so there is just no
+way it could fit.
+
+For this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash memory and 256 KB of RAM.
+However, any other Arduino with those specs or better should also work.
+
+Generating our project
+^^^^^^^^^^^^^^^^^^^^^^
+Next, we'll compile the model to TVM's MLF (model library format) intermediate representation,
+which consists of C/C++ code and is designed for autotuning. To improve performance, we'll tell
+TVM that we're compiling for the ``nrf52840`` microprocessor (the one the Nano 33 BLE uses). We'll
+also tell it to use the C runtime (abbreviated ``crt``) and to use ahead-of-time memory allocation
+(abbreviated ``aot``, which helps reduce the model's memory footprint). Lastly, we will disable
+vectorization with ``"tir.disable_vectorize": True``, as C has no native vectorized types.
+
+Once we have set these configuration parameters, we will call ``tvm.relay.build`` to compile our
+Relay model into the MLF intermediate representation. From here, we just need to call
+``tvm.micro.generate_project`` and pass in the Arduino template project to finish compilation.
+
+
+.. code-block:: default
+
+
+    import shutil
+    import tflite
+    import tvm
+
+    # Method to load model is different in TFLite 1 vs 2
+    try:  # TFLite 2.1 and above
+        tflite_model = tflite.Model.GetRootAsModel(quantized_model, 0)
+    except AttributeError:  # Fall back to TFLite 1.14 method
+        tflite_model = tflite.Model.Model.GetRootAsModel(quantized_model, 0)
+
+    # Convert to the Relay intermediate representation
+    mod, params = tvm.relay.frontend.from_tflite(tflite_model)
+
+    # Set configuration flags to improve performance
+    target = tvm.target.target.micro("nrf52840")
+    runtime = tvm.relay.backend.Runtime("crt")
+    executor = tvm.relay.backend.Executor("aot", {"unpacked-api": True})
+
+    # Convert to the MLF intermediate representation
+    with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
+        mod = tvm.relay.build(mod, target, runtime=runtime, executor=executor, params=params)
+
+    # Generate an Arduino project from the MLF intermediate representation
+    shutil.rmtree(f"{FOLDER}/models/project", ignore_errors=True)
+    arduino_project = tvm.micro.generate_project(
+        tvm.micro.get_microtvm_template_projects("arduino"),
+        mod,
+        f"{FOLDER}/models/project",
+        {
+            "arduino_board": "nano33ble",
+            "arduino_cli_cmd": "/content/bin/arduino-cli",
+            "project_type": "example_project",
+        },
+    )
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
+      "target_host parameter is going to be deprecated. "
+
+
+
+Testing our Arduino Project
+---------------------------
+Consider the following two 224x224 images from the author's camera roll - one of a car, one not.
+We will test our Arduino project by loading both of these images and executing the compiled model
+on them.
+
+.. image:: https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png
+     :align: center
+     :height: 200px
+     :width: 600px
+
+Currently, these are 224x224 PNG images we can download from Imgur. Before we can feed in these
+images, we'll need to resize and convert them to raw data, which can be done with ``imagemagick``.
+
+It's also challenging to load raw data onto an Arduino, as only C/CPP files (and similar) are
+compiled. We can work around this by embedding our raw data in a hard-coded C array with the
+built-in utility ``bin2c`` that will output a file like below:
+
+    .. code-block:: c
+
+      static const unsigned char CAR_IMAGE[] = {
+        0x22,0x23,0x14,0x22,
+        ...
+        0x07,0x0e,0x08,0x08
+      };
+
+We can do both of these things with a few lines of Bash code:
+
+    .. code-block:: bash
+
+      %%bash
+      mkdir -p ~/tests
+      curl "https://i.imgur.com/JBbEhxN.png" -o ~/tests/car_224.png
+      convert ~/tests/car_224.png -resize 64 ~/tests/car_64.png
+      stream ~/tests/car_64.png ~/tests/car.raw
+      bin2c -c -st ~/tests/car.raw --name CAR_IMAGE > ~/models/project/car.c
+
+      curl "https://i.imgur.com/wkh7Dx2.png" -o ~/tests/catan_224.png
+      convert ~/tests/catan_224.png -resize 64 ~/tests/catan_64.png
+      stream ~/tests/catan_64.png ~/tests/catan.raw
+      bin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE > ~/models/project/catan.c
+
+Writing our Arduino Script
+--------------------------
+We now need a little bit of Arduino code to read the two binary arrays we just generated, run the
+model on them, and log the output to the serial monitor. This file will replace ``arduino_sketch.ino``
+as the main file of our sketch. You'll have to copy this code in manually..
+
+    .. code-block:: c
+
+        %%writefile /root/models/project.ino
+        #include "src/model.h"
+        #include "car.c"
+        #include "catan.c"
+
+        void setup() {
+          Serial.begin(9600);
+          TVMInitialize();
+        }
+
+        void loop() {
+          uint8_t result_data[2];
+          Serial.println("Car results:");
+          TVMExecute(const_cast<uint8_t*>(CAR_IMAGE), result_data);
+          Serial.print(result_data[0]); Serial.print(", ");
+          Serial.print(result_data[1]); Serial.println();
+
+          Serial.println("Other object results:");
+          TVMExecute(const_cast<uint8_t*>(CATAN_IMAGE), result_data);
+          Serial.print(result_data[0]); Serial.print(", ");
+          Serial.print(result_data[1]); Serial.println();
+
+          delay(1000);
+        }
+
+Compiling Our Code
+^^^^^^^^^^^^^^^^^^
+Now that our project has been generated, TVM's job is mostly done! We can still call
+``arduino_project.build()`` and ``arduino_project.upload()``, but these just use ``arduino-cli``'s
+compile and flash commands underneath. We could also begin autotuning our model, but that's a
+subject for a different tutorial. To finish up, we'll verify no compiler errors are thrown
+by our project:
+
+
+.. code-block:: default
+
+
+    shutil.rmtree(f"{FOLDER}/models/project/build", ignore_errors=True)
+    # sphinx_gallery_start_ignore
+    from unittest.mock import MagicMock
+
+    arduino_project = MagicMock()
+    # sphinx_gallery_end_ignore
+    arduino_project.build()
+    print("Compilation succeeded!")
+
+
+
+
+
+.. rst-class:: sphx-glr-script-out
+
+ Out:
+
+ .. code-block:: none
+
+    Compilation succeeded!
+
+
+
+Uploading to Our Device
+-----------------------
+The very last step is uploading our sketch to an Arduino to make sure our code works properly.
+Unfortunately, we can't do that from Google Colab, so we'll have to download our sketch. This is
+simple enough to do - we'll just turn our project into a `.zip` archive, and call `files.download`.
+If you're running on Google Colab, you'll have to uncomment the last two lines to download the file
+after writing it.
+
+
+.. code-block:: default
+
+
+    ZIP_FOLDER = f"{FOLDER}/models/project"
+    shutil.make_archive(ZIP_FOLDER, "zip", ZIP_FOLDER)
+    # from google.colab import files
+    # files.download(f"{FOLDER}/models/project.zip")
+    # sphinx_gallery_start_ignore
+    # Run a few unit tests to make sure the Python code worked
+
+    # Ensure transfer learn model was correctly assembled
+    assert len(model.layers) == 5
+    assert model.count_params() == 219058  # Only 219,058 of these are trainable
+
+    assert len(quantized_model) >= 250000  # Quantized model will be 250 KB - 350 KB
+    assert len(quantized_model) <= 350000  # Exact value depends on quantization
+
+    # Assert .tflite and .zip files were written to disk
+    assert os.path.isfile(f"{FOLDER}/models/quantized.tflite")
+    assert os.path.isfile(f"{FOLDER}/models/project.zip")
+
+    # Assert MLF file was correctly generated
+    assert str(mod.executor) == "aot"
+
+    # Remove the temporary folder we generated at the beginning
+    shutil.rmtree(FOLDER)
+    # sphinx_gallery_end_ignore
+
+
+
+
+
+
+
+
+From here, we'll need to open it in the Arduino IDE. You'll have to download the IDE as well as
+the SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may
+have to change settings to control how much memory you want the board to use.
+
+Expected Results
+^^^^^^^^^^^^^^^^
+If all works as expected, you should see the following output on a Serial monitor:
+
+    .. code-block::
+
+      Car results:
+      255, 0
+      Other object results:
+      0, 255
+
+The first number represents the model's confidence that the object **is** a car and ranges from
+0-255. The second number represents the model's confidence that the object **is not** a car and
+is also 0-255. These results mean the model is very sure that the first image is a car, and the
+second image is not (which is correct). Hence, our model is working!
+
+Summary
+-------
+In this tutorial, we used transfer learning to quickly train an image recognition model to
+identify cars. We modified its input dimensions and last few layers to make it better at this,
+and to make it faster and smaller. We then quantified the model and compiled it using TVM to
+create an Arduino sketch. Lastly, we tested the model using two static images to prove it works
+as intended.
+
+Next Steps
+^^^^^^^^^^
+From here, we could modify the model to read live images from the camera - we have another
+Arduino tutorial for how to do that `on GitHub <https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection>`_. Alternatively, we could also
+`use TVM's autotuning capabilities <https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html>`_ to dramatically improve the model's performance.
+
+
+
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 4 minutes  7.242 seconds)
+
+
+.. _sphx_glr_download_how_to_work_with_microtvm_micro_train.py:
+
+
+.. only :: html
+
+ .. container:: sphx-glr-footer
+    :class: sphx-glr-footer-example
+
+
+
+  .. container:: sphx-glr-download
+
+     :download:`Download Python source code: micro_train.py <micro_train.py>`
+
+
+
+  .. container:: sphx-glr-download
+
+     :download:`Download Jupyter notebook: micro_train.ipynb <micro_train.ipynb>`
+
+
+.. only:: html
+
+ .. rst-class:: sphx-glr-signature
+
+    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
diff --git a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
index 35440261e..dc444f77f 100644
--- a/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_microtvm/sg_execution_times.rst.txt
@@ -5,10 +5,11 @@
 
 Computation times
 =================
-**00:45.476** total execution time for **how_to_work_with_microtvm** files:
-
-- **00:41.207**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)
-- **00:03.698**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)
-- **00:00.198**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)
-- **00:00.193**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tvmc.py` (``micro_tvmc.py``)
-- **00:00.180**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_reference_vm.py` (``micro_reference_vm.py``)
+**04:53.618** total execution time for **how_to_work_with_microtvm** files:
+
+- **04:07.242**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_train.py` (``micro_train.py``)
+- **00:41.830**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_autotune.py` (``micro_autotune.py``)
+- **00:03.931**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tflite.py` (``micro_tflite.py``)
+- **00:00.208**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_tvmc.py` (``micro_tvmc.py``)
+- **00:00.205**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_ethosu.py` (``micro_ethosu.py``)
+- **00:00.202**: :ref:`sphx_glr_how_to_work_with_microtvm_micro_reference_vm.py` (``micro_reference_vm.py``)
diff --git a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
index ab150c647..3bd2df82f 100644
--- a/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_relay/sg_execution_times.rst.txt
@@ -5,8 +5,8 @@
 
 Computation times
 =================
-**00:06.408** total execution time for **how_to_work_with_relay** files:
+**00:11.833** total execution time for **how_to_work_with_relay** files:
 
-- **00:04.518**: :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)
-- **00:01.686**: :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)
-- **00:00.203**: :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)
+- **00:09.937**: :ref:`sphx_glr_how_to_work_with_relay_using_external_lib.py` (``using_external_lib.py``)
+- **00:01.677**: :ref:`sphx_glr_how_to_work_with_relay_build_gcn.py` (``build_gcn.py``)
+- **00:00.219**: :ref:`sphx_glr_how_to_work_with_relay_using_relay_viz.py` (``using_relay_viz.py``)
diff --git a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
index be03919ac..18049d988 100644
--- a/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/sg_execution_times.rst.txt
@@ -5,13 +5,13 @@
 
 Computation times
 =================
-**00:05.799** total execution time for **how_to_work_with_schedules** files:
+**00:05.588** total execution time for **how_to_work_with_schedules** files:
 
-- **00:02.158**: :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)
-- **00:01.248**: :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)
-- **00:00.739**: :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)
-- **00:00.719**: :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)
+- **00:02.069**: :ref:`sphx_glr_how_to_work_with_schedules_intrin_math.py` (``intrin_math.py``)
+- **00:01.121**: :ref:`sphx_glr_how_to_work_with_schedules_tensorize.py` (``tensorize.py``)
+- **00:00.734**: :ref:`sphx_glr_how_to_work_with_schedules_reduction.py` (``reduction.py``)
+- **00:00.692**: :ref:`sphx_glr_how_to_work_with_schedules_scan.py` (``scan.py``)
 - **00:00.290**: :ref:`sphx_glr_how_to_work_with_schedules_extern_op.py` (``extern_op.py``)
-- **00:00.223**: :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``)
-- **00:00.219**: :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)
-- **00:00.204**: :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)
+- **00:00.233**: :ref:`sphx_glr_how_to_work_with_schedules_schedule_primitives.py` (``schedule_primitives.py``)
+- **00:00.225**: :ref:`sphx_glr_how_to_work_with_schedules_tuple_inputs.py` (``tuple_inputs.py``)
+- **00:00.224**: :ref:`sphx_glr_how_to_work_with_schedules_tedd.py` (``tedd.py``)
diff --git a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
index b53ab7f49..25562b98c 100644
--- a/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
+++ b/docs/_sources/how_to/work_with_schedules/tensorize.rst.txt
@@ -318,7 +318,7 @@ The importing needs to happen before the tensorized GEMV being executed.
                  C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
       buffer_map = {A_1: A, B_1: B, C_1: C}
       preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp5oria4hu/input0.cc'\nsource_filename = \"/tmp/tmp5oria4hu/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
+      attr [IterVar(i: int32, (nullptr), "DataPar", "")] "pragma_import_llvm" = "; ModuleID = '/tmp/tmp992t4uwf/input0.cc'\nsource_filename = \"/tmp/tmp992t4uwf/input0.cc\"\ntarget datalayout = \"e-m:e-i64:64-f80:128-n8:16:32:64-S128\"\ntarget triple = \"x86_64-pc-linux-gnu\"\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = alloca float*, align 8\n  %8 = alloca float*, align 8\n  %9 = alloca floa [...]
       for (i, 0, 1024) {
         for (j.outer: int32, 0, 32) {
           @tir.call_extern("gemv_update", @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
index 4a4e72339..d7733e975 100644
--- a/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/autotvm/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
 
 Computation times
 =================
-**00:20.898** total execution time for **topic_vta_tutorials_autotvm** files:
+**00:21.548** total execution time for **topic_vta_tutorials_autotvm** files:
 
-- **00:20.710**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``)
-- **00:00.188**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)
+- **00:21.332**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_relay_vta.py` (``tune_relay_vta.py``)
+- **00:00.215**: :ref:`sphx_glr_topic_vta_tutorials_autotvm_tune_alu_vta.py` (``tune_alu_vta.py``)
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
index 1c9a13c25..7b4a4aace 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_classification.rst.txt
@@ -267,7 +267,7 @@ The compilation steps are:
       DeprecationWarning,
     /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
       relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-    resnet18_v1 inference graph built in 21.37s!
+    resnet18_v1 inference graph built in 21.96s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
index 32e88b8ae..a3faa0260 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/deploy_detection.rst.txt
@@ -303,7 +303,7 @@ The compilation steps are:
       "target_host parameter is going to be deprecated. "
     /workspace/python/tvm/relay/build_module.py:389: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
       DeprecationWarning,
-    yolov3-tiny inference graph built in 15.05s!
+    yolov3-tiny inference graph built in 15.44s!
 
 
 
diff --git a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
index f53cf8940..8655e354d 100644
--- a/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/frontend/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
 
 Computation times
 =================
-**01:29.370** total execution time for **topic_vta_tutorials_frontend** files:
+**01:30.558** total execution time for **topic_vta_tutorials_frontend** files:
 
-- **00:47.481**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)
-- **00:41.889**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``)
+- **00:47.890**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_detection.py` (``deploy_detection.py``)
+- **00:42.668**: :ref:`sphx_glr_topic_vta_tutorials_frontend_deploy_classification.py` (``deploy_classification.py``)
diff --git a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
index b1da520dd..eda394ca1 100644
--- a/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/optimize/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
 
 Computation times
 =================
-**00:03.545** total execution time for **topic_vta_tutorials_optimize** files:
+**00:03.586** total execution time for **topic_vta_tutorials_optimize** files:
 
-- **00:02.977**: :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)
-- **00:00.568**: :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``)
+- **00:03.005**: :ref:`sphx_glr_topic_vta_tutorials_optimize_convolution_opt.py` (``convolution_opt.py``)
+- **00:00.581**: :ref:`sphx_glr_topic_vta_tutorials_optimize_matrix_multiply_opt.py` (``matrix_multiply_opt.py``)
diff --git a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
index 745242471..10b8e84e9 100644
--- a/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
+++ b/docs/_sources/topic/vta/tutorials/sg_execution_times.rst.txt
@@ -5,7 +5,7 @@
 
 Computation times
 =================
-**00:01.041** total execution time for **topic_vta_tutorials** files:
+**00:01.058** total execution time for **topic_vta_tutorials** files:
 
-- **00:00.526**: :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``)
-- **00:00.515**: :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``)
+- **00:00.539**: :ref:`sphx_glr_topic_vta_tutorials_matrix_multiply.py` (``matrix_multiply.py``)
+- **00:00.519**: :ref:`sphx_glr_topic_vta_tutorials_vta_get_started.py` (``vta_get_started.py``)
diff --git a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
index 145d4285d..bde6c0c5b 100644
--- a/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
+++ b/docs/_sources/tutorial/auto_scheduler_matmul_x86.rst.txt
@@ -306,7 +306,7 @@ We build the binary and check its correctness and performance.
 
  .. code-block:: none
 
-    Execution time of this operator: 93.672 ms
+    Execution time of this operator: 93.732 ms
 
 
 
@@ -402,7 +402,7 @@ resume the status and do more 5 trials.
     Resume search:
     /usr/local/lib/python3.7/dist-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html
       warnings.warn(f'Old style callback is deprecated.  See: {link}', UserWarning)
-    *E
+
 
 
 
@@ -415,11 +415,6 @@ Expression (TE) language that demonstrates how TVM can optimize computational
 operations.
 
 
-.. rst-class:: sphx-glr-timing
-
-   **Total running time of the script:** ( 1 minutes  7.017 seconds)
-
-
 .. _sphx_glr_download_tutorial_auto_scheduler_matmul_x86.py:
 
 
diff --git a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
index d9f3f1796..af2a546a8 100644
--- a/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
+++ b/docs/_sources/tutorial/autotvm_relay_x86.rst.txt
@@ -280,7 +280,7 @@ standard deviation.
 
  .. code-block:: none
 
-    {'mean': 494.0058553899962, 'median': 493.83366904999093, 'std': 0.6866634716120491}
+    {'mean': 497.6316300500048, 'median': 497.7335548499923, 'std': 1.7081865449643931}
 
 
 
@@ -494,31 +494,31 @@ the tuning data to.
 
  .. code-block:: none
 
-
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.55/  17.55 GFLOPS | Progress: (4/20) | 5.92 s
    [Task  1/25]  Current/Best:    6.17/  17.55 GFLOPS | Progress: (8/20) | 8.90 s
    [Task  1/25]  Current/Best:   11.57/  22.76 GFLOPS | Progress: (12/20) | 11.32 s
    [Task  1/25]  Current/Best:   16.87/  22.78 GFLOPS | Progress: (16/20) | 12.99 s
    [Task  1/25]  Current/Best:   11.61/  23.96 GFLOPS | Progress: (20/20) | 14.71 s Done.
-
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.06/  12.92 GFLOPS | Progress: (4/20) | 3.78 s
    [Task  2/25]  Current/Best:   14.10/  18.85 GFLOPS | Progress: (8/20) | 5.08 s
    [Task  2/25]  Current/Best:   20.70/  20.70 GFLOPS | Progress: (12/20) | 6.41 s
    [Task  2/25]  Current/Best:   12.51/  20.70 GFLOPS | Progress: (16/20) | 7.65 s
    [Task  2/25]  Current/Best:   19.29/  20.70 GFLOPS | Progress: (20/20) | 9.24 s Done.
-
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.60 GFLOPS | Progress: (4/20) | 5.76 s
    [Task  3/25]  Current/Best:   15.61/  16.92 GFLOPS | Progress: (8/20) | 7.68 s
    [Task  3/25]  Current/Best:   14.92/  16.92 GFLOPS | Progress: (12/20) | 9.38 s
    [Task  3/25]  Current/Best:    7.21/  23.86 GFLOPS | Progress: (16/20) | 11.25 s
    [Task  3/25]  Current/Best:   11.36/  23.86 GFLOPS | Progress: (20/20) | 15.80 s Done.
-
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.57/  20.41 GFLOPS | Progress: (4/20) | 2.31 s
    [Task  4/25]  Current/Best:    6.84/  20.41 GFLOPS | Progress: (8/20) | 6.95 s
    [Task  4/25]  Current/Best:   22.51/  22.51 GFLOPS | Progress: (12/20) | 11.71 s
    [Task  4/25]  Current/Best:   17.37/  22.51 GFLOPS | Progress: (16/20) | 14.07 s
    [Task  4/25]  Current/Best:   13.40/  22.51 GFLOPS | Progress: (20/20) | 16.03 s Done.
-
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.70/  10.35 GFLOPS | Progress: (4/20) | 2.49 s
    [Task  5/25]  Current/Best:   11.78/  12.30 GFLOPS | Progress: (8/20) | 4.57 s
    [Task  5/25]  Current/Best:   11.83/  18.09 GFLOPS | Progress: (12/20) | 7.73 s
    [Task  5/25]  Current/Best:   11.83/  22.78 GFLOPS | Progress: (16/20) | 9.14 s
    [Task  5/25]  Current/Best:   12.07/  22.78 GFLOPS | Progress: (20/20) | 11.06 s Done.
-
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.27/  20.78 GFLOPS | Progress: (4/20) | 4.00 s
    [Task  6/25]  Current/Best:   19.02/  20.78 GFLOPS | Progress: (8/20) | 5.75 s
    [Task  6/25]  Current/Best:   13.22/  20.78 GFLOPS | Progress: (12/20) | 7.70 s
    [Task  6/25]  Current/Best:   19.89/  20.78 GFLOPS | Progress: (16/20) | 9.92 s
    [Task  6/25]  Current/Best:    3.73/  20.78 GFLOPS | Progress: (20/20) | 12.46 s Done.
-
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   11.21/  12.78 GFLOPS | Progress: (4/20) | 3.52 s
    [Task  7/25]  Current/Best:   20.28/  21.16 GFLOPS | Progress: (8/20) | 5.02 s
    [Task  7/25]  Current/Best:   16.11/  21.16 GFLOPS | Progress: (12/20) | 6.91 s
    [Task  7/25]  Current/Best:   12.27/  21.16 GFLOPS | Progress: (16/20) | 8.94 s
    [Task  7/25]  Current/Best:    6.31/  21.92 GFLOPS | Progress: (20/20) | 11.38 s Done.
-
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:    9.81/  14.29 GFLOPS | Progress: (4/20) | 2.83 s
    [Task  8/25]  Current/Best:    9.59/  14.29 GFLOPS | Progress: (8/20) | 7.92 s
    [Task  8/25]  Current/Best:   12.69/  14.29 GFLOPS | Progress: (12/20) | 14.29 s
    [Task  8/25]  Current/Best:   18.83/  18.83 GFLOPS | Progress: (16/20) | 16.37 s
    [Task  8/25]  Current/Best:   19.88/  19.88 GFLOPS | Progress: (20/20) | 23.33 s Done.
-
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.32/  14.32 GFLOPS | Progress: (4/20) | 11.89 s
    [Task  9/25]  Current/Best:   23.50/  23.50 GFLOPS | Progress: (8/20) | 13.59 s
    [Task  9/25]  Current/Best:    8.30/  23.50 GFLOPS | Progress: (12/20) | 16.11 s
    [Task  9/25]  Current/Best:   17.91/  23.50 GFLOPS | Progress: (16/20) | 18.99 s
    [Task  9/25]  Current/Best:    9.07/  23.50 GFLOPS | Progress: (20/20) | 27.51 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.05/  18.05 GFLOPS | Progress: (4/20) | 2.52 s
    [Task 10/25]  Current/Best:   15.39/  18.05 GFLOPS | Progress: (8/20) | 4.13 s
    [Task 10/25]  Current/Best:   12.58/  18.68 GFLOPS | Progress: (12/20) | 5.66 s
    [Task 10/25]  Current/Best:   19.00/  20.37 GFLOPS | Progress: (16/20) | 6.76 s
    [Task 10/25]  Current/Best:    8.86/  20.37 GFLOPS | Progress: (20/20
 ) | 8.30 s Done.
-
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   12.36/  18.17 GFLOPS | Progress: (4/20) | 3.24 s
    [Task 11/25]  Current/Best:   16.72/  18.17 GFLOPS | Progress: (8/20) | 6.03 s
    [Task 11/25]  Current/Best:   17.58/  18.17 GFLOPS | Progress: (12/20) | 8.09 s
    [Task 11/25]  Current/Best:   13.46/  21.23 GFLOPS | Progress: (16/20) | 10.93 s
    [Task 11/25]  Current/Best:   19.38/  21.64 GFLOPS | Progress: (20/20) | 13.00 s Done.
-
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.86/  18.07 GFLOPS | Progress: (4/20) | 5.63 s
    [Task 12/25]  Current/Best:    5.23/  18.07 GFLOPS | Progress: (8/20) | 9.50 s
    [Task 12/25]  Current/Best:   18.88/  18.88 GFLOPS | Progress: (12/20) | 11.47 s
    [Task 12/25]  Current/Best:   15.53/  18.88 GFLOPS | Progress: (16/20) | 14.38 s
    [Task 12/25]  Current/Best:   15.17/  18.88 GFLOPS | Progress: (20/20) | 16.28 s Done.
-
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.20/  17.34 GFLOPS | Progress: (4/20) | 3.64 s
    [Task 13/25]  Current/Best:   16.03/  21.01 GFLOPS | Progress: (8/20) | 6.19 s
    [Task 13/25]  Current/Best:   19.72/  21.78 GFLOPS | Progress: (12/20) | 9.23 s
    [Task 13/25]  Current/Best:   12.34/  21.78 GFLOPS | Progress: (16/20) | 12.64 s
    [Task 13/25]  Current/Best:   18.56/  21.78 GFLOPS | Progress: (20/20) | 14.96 s Done.
-
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   13.69/  13.69 GFLOPS | Progress: (4/20) | 3.33 s
    [Task 14/25]  Current/Best:    6.00/  13.69 GFLOPS | Progress: (8/20) | 5.52 s
    [Task 14/25]  Current/Best:   20.86/  20.86 GFLOPS | Progress: (12/20) | 8.20 s
    [Task 14/25]  Current/Best:   16.07/  20.86 GFLOPS | Progress: (16/20) | 9.85 s Done.
-
    [Task 14/25]  Current/Best:   17.35/  20.86 GFLOPS | Progress: (20/20) | 11.52 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.13/  17.67 GFLOPS | Progress: (4/20) | 2.59 s
    [Task 15/25]  Current/Best:   14.40/  18.07 GFLOPS | Progress: (8/20) | 3.94 s
    [Task 15/25]  Current/Best:   10.39/  22.37 GFLOPS | Progress: (12/20) | 6.15 s
    [Task 15/25]  Current/Best:   20.24/  22.37 GFLOPS | Progress: (16/20) | 9.27 s
    [Task 15/25]  Current/Best:    9.54/  22.37 GFLOPS | Progress: (20/20) | 10.29 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   19.62/  19.62 GFLOPS | Progress: (4/20) | 2.81 s
    [Task 16/25]  Current/Best:    3.04/  19.62 GFLOPS | Progress: (8/20) | 4.41 s
    [Task 16/25]  Current/Best:   19.40/  19.62 GFLOPS | Progress: (12/20) | 5.63 s
    [Task 16/25]  Current/Best:   17.86/  19.62 GFLOPS | Progress: (16/20) |
  7.01 s
    [Task 16/25]  Current/Best:    9.98/  22.44 GFLOPS | Progress: (20/20) | 9.15 s Done.
-
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   12.92/  18.67 GFLOPS | Progress: (4/20) | 4.70 s
    [Task 17/25]  Current/Best:   14.47/  23.46 GFLOPS | Progress: (8/20) | 7.54 s
    [Task 17/25]  Current/Best:   16.72/  23.46 GFLOPS | Progress: (12/20) | 9.58 s
    [Task 17/25]  Current/Best:   16.55/  23.46 GFLOPS | Progress: (16/20) | 11.80 s
    [Task 17/25]  Current/Best:   10.04/  23.46 GFLOPS | Progress: (20/20) | 13.93 s Done.
-
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.38/  16.89 GFLOPS | Progress: (4/20) | 3.72 s
    [Task 18/25]  Current/Best:   10.53/  20.06 GFLOPS | Progress: (8/20) | 7.32 s
    [Task 18/25]  Current/Best:   19.31/  20.06 GFLOPS | Progress: (12/20) | 9.24 s
    [Task 18/25]  Current/Best:   10.09/  20.06 GFLOPS | Progress: (16/20) | 13.07 s
    [Task 18/25]  Current/Best:   20.52/  20.52 GFLOPS | Progress: (20/20) | 14.59 s Done.
-
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    7.06/  20.42 GFLOPS | Progress: (4/20) | 6.04 s
    [Task 19/25]  Current/Best:    2.61/  20.42 GFLOPS | Progress: (8/20) | 9.43 s
    [Task 19/25]  Current/Best:   20.20/  21.87 GFLOPS | Progress: (12/20) | 12.41 s
    [Task 19/25]  Current/Best:   14.12/  21.87 GFLOPS | Progress: (16/20) | 15.44 s
    [Task 19/25]  Current/Best:    2.71/  23.80 GFLOPS | Progress: (20/20) | 18.26 s Done.
-
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    8.06/  15.05 GFLOPS | Progress: (4/20) | 3.24 s Done.
+
    [Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  1/25]  Current/Best:   17.44/  17.44 GFLOPS | Progress: (4/20) | 6.03 s
    [Task  1/25]  Current/Best:    6.17/  17.44 GFLOPS | Progress: (8/20) | 8.99 s
    [Task  1/25]  Current/Best:   11.53/  22.67 GFLOPS | Progress: (12/20) | 11.44 s
    [Task  1/25]  Current/Best:   16.77/  22.67 GFLOPS | Progress: (16/20) | 13.14 s
    [Task  1/25]  Current/Best:   11.59/  23.91 GFLOPS | Progress: (20/20) | 14.88 s Done.
+
    [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  2/25]  Current/Best:   12.30/  13.27 GFLOPS | Progress: (4/20) | 3.72 s
    [Task  2/25]  Current/Best:   13.62/  18.58 GFLOPS | Progress: (8/20) | 5.06 s
    [Task  2/25]  Current/Best:   21.17/  21.17 GFLOPS | Progress: (12/20) | 6.37 s
    [Task  2/25]  Current/Best:   12.92/  21.17 GFLOPS | Progress: (16/20) | 7.68 s
    [Task  2/25]  Current/Best:   19.54/  21.17 GFLOPS | Progress: (20/20) | 9.31 s Done.
+
    [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  3/25]  Current/Best:    1.63/  10.54 GFLOPS | Progress: (4/20) | 5.84 s
    [Task  3/25]  Current/Best:   15.55/  16.85 GFLOPS | Progress: (8/20) | 7.78 s
    [Task  3/25]  Current/Best:   14.88/  16.85 GFLOPS | Progress: (12/20) | 9.50 s
    [Task  3/25]  Current/Best:    7.02/  23.68 GFLOPS | Progress: (16/20) | 11.43 s
    [Task  3/25]  Current/Best:   11.25/  23.68 GFLOPS | Progress: (20/20) | 16.02 s Done.
+
    [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  4/25]  Current/Best:    9.17/  20.53 GFLOPS | Progress: (4/20) | 2.37 s
    [Task  4/25]  Current/Best:    6.78/  20.53 GFLOPS | Progress: (8/20) | 7.11 s
    [Task  4/25]  Current/Best:   21.61/  21.61 GFLOPS | Progress: (12/20) | 11.94 s
    [Task  4/25]  Current/Best:   17.25/  21.61 GFLOPS | Progress: (16/20) | 14.34 s
    [Task  4/25]  Current/Best:   13.20/  21.61 GFLOPS | Progress: (20/20) | 16.41 s Done.
+
    [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  5/25]  Current/Best:    9.58/  10.20 GFLOPS | Progress: (4/20) | 2.56 s
    [Task  5/25]  Current/Best:   11.80/  12.70 GFLOPS | Progress: (8/20) | 4.64 s
    [Task  5/25]  Current/Best:   11.77/  18.04 GFLOPS | Progress: (12/20) | 7.83 s
    [Task  5/25]  Current/Best:   11.70/  22.56 GFLOPS | Progress: (16/20) | 9.25 s
    [Task  5/25]  Current/Best:   12.09/  22.56 GFLOPS | Progress: (20/20) | 11.17 s Done.
+
    [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  6/25]  Current/Best:   12.20/  20.80 GFLOPS | Progress: (4/20) | 4.00 s
    [Task  6/25]  Current/Best:   19.00/  20.80 GFLOPS | Progress: (8/20) | 5.77 s
    [Task  6/25]  Current/Best:   13.28/  20.80 GFLOPS | Progress: (12/20) | 7.73 s
    [Task  6/25]  Current/Best:   19.98/  20.80 GFLOPS | Progress: (16/20) | 9.96 s
    [Task  6/25]  Current/Best:    3.71/  20.80 GFLOPS | Progress: (20/20) | 12.47 s Done.
+
    [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  7/25]  Current/Best:   11.24/  12.92 GFLOPS | Progress: (4/20) | 3.46 s
    [Task  7/25]  Current/Best:   19.99/  21.01 GFLOPS | Progress: (8/20) | 4.97 s
    [Task  7/25]  Current/Best:   15.86/  21.01 GFLOPS | Progress: (12/20) | 6.91 s
    [Task  7/25]  Current/Best:   12.27/  21.01 GFLOPS | Progress: (16/20) | 8.96 s
    [Task  7/25]  Current/Best:    6.43/  21.67 GFLOPS | Progress: (20/20) | 11.43 s Done.
+
    [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  8/25]  Current/Best:    9.97/  13.84 GFLOPS | Progress: (4/20) | 2.85 s
    [Task  8/25]  Current/Best:    9.40/  13.84 GFLOPS | Progress: (8/20) | 7.98 s
    [Task  8/25]  Current/Best:   12.71/  13.84 GFLOPS | Progress: (12/20) | 14.44 s
    [Task  8/25]  Current/Best:   19.01/  19.01 GFLOPS | Progress: (16/20) | 16.57 s
    [Task  8/25]  Current/Best:   19.87/  19.87 GFLOPS | Progress: (20/20) | 23.67 s Done.
+
    [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task  9/25]  Current/Best:   14.32/  15.77 GFLOPS | Progress: (4/20) | 11.88 s
    [Task  9/25]  Current/Best:   19.26/  19.88 GFLOPS | Progress: (8/20) | 13.64 s
    [Task  9/25]  Current/Best:    8.25/  19.88 GFLOPS | Progress: (12/20) | 16.13 s
    [Task  9/25]  Current/Best:   18.05/  19.88 GFLOPS | Progress: (16/20) | 18.96 s
    [Task  9/25]  Current/Best:    9.05/  19.88 GFLOPS | Progress: (20/20) | 27.46 s
    [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 10/25]  Current/Best:   18.12/  18.12 GFLOPS | Progress: (4/20) | 2.49 s
    [Task 10/25]  Current/Best:   15.54/  18.12 GFLOPS | Progress: (8/20) | 4.12 s
    [Task 10/25]  Current/Best:   12.59/  18.74 GFLOPS | Progress: (12/20) | 5.67 s
    [Task 10/25]  Current/Best:   19.19/  20.31 GFLOPS | Progress: (16/20) | 6.77 s
    [Task 10/25]  Current/Best:    8.83/  20.31 GFLOPS | Progress: (20/20
 ) | 8.31 s Done.
+
    [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 11/25]  Current/Best:   12.23/  18.09 GFLOPS | Progress: (4/20) | 3.32 s
    [Task 11/25]  Current/Best:   16.85/  18.09 GFLOPS | Progress: (8/20) | 6.14 s
    [Task 11/25]  Current/Best:   18.11/  18.11 GFLOPS | Progress: (12/20) | 8.18 s
    [Task 11/25]  Current/Best:   13.13/  21.29 GFLOPS | Progress: (16/20) | 11.06 s
    [Task 11/25]  Current/Best:   19.54/  21.60 GFLOPS | Progress: (20/20) | 13.15 s Done.
+
    [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 12/25]  Current/Best:    7.83/  18.08 GFLOPS | Progress: (4/20) | 5.65 s
    [Task 12/25]  Current/Best:    5.20/  18.08 GFLOPS | Progress: (8/20) | 9.57 s
    [Task 12/25]  Current/Best:   18.93/  18.93 GFLOPS | Progress: (12/20) | 11.57 s
    [Task 12/25]  Current/Best:   13.94/  18.93 GFLOPS | Progress: (16/20) | 14.54 s
    [Task 12/25]  Current/Best:   15.04/  18.93 GFLOPS | Progress: (20/20) | 16.48 s Done.
+
    [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 13/25]  Current/Best:    8.74/  17.34 GFLOPS | Progress: (4/20) | 3.73 s
    [Task 13/25]  Current/Best:   16.06/  20.75 GFLOPS | Progress: (8/20) | 6.33 s
    [Task 13/25]  Current/Best:   19.55/  20.75 GFLOPS | Progress: (12/20) | 9.49 s
    [Task 13/25]  Current/Best:   12.23/  20.75 GFLOPS | Progress: (16/20) | 13.01 s
    [Task 13/25]  Current/Best:   18.34/  20.75 GFLOPS | Progress: (20/20) | 15.35 s Done.
+
    [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 14/25]  Current/Best:   13.67/  13.67 GFLOPS | Progress: (4/20) | 3.42 s
    [Task 14/25]  Current/Best:    6.05/  13.67 GFLOPS | Progress: (8/20) | 5.61 s
    [Task 14/25]  Current/Best:   20.34/  20.34 GFLOPS | Progress: (12/20) | 8.32 s
    [Task 14/25]  Current/Best:   15.83/  20.34 GFLOPS | Progress: (16/20) | 9.98 s Done.
+
    [Task 14/25]  Current/Best:   17.61/  20.34 GFLOPS | Progress: (20/20) | 11.75 s
    [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 15/25]  Current/Best:   16.18/  17.60 GFLOPS | Progress: (4/20) | 2.66 s
    [Task 15/25]  Current/Best:   14.40/  17.60 GFLOPS | Progress: (8/20) | 3.95 s
    [Task 15/25]  Current/Best:   10.41/  22.34 GFLOPS | Progress: (12/20) | 6.20 s
    [Task 15/25]  Current/Best:   20.44/  22.34 GFLOPS | Progress: (16/20) | 9.54 s
    [Task 15/25]  Current/Best:    9.71/  22.34 GFLOPS | Progress: (20/20) | 10.55 s
    [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 16/25]  Current/Best:   20.66/  20.66 GFLOPS | Progress: (4/20) | 2.84 s
    [Task 16/25]  Current/Best:    3.04/  20.66 GFLOPS | Progress: (8/20) | 4.48 s
    [Task 16/25]  Current/Best:   19.72/  20.66 GFLOPS | Progress: (12/20) | 5.69 s
    [Task 16/25]  Current/Best:   17.87/  20.66 GFLOPS | Progress: (16/20) |
  7.03 s
    [Task 16/25]  Current/Best:    9.99/  20.66 GFLOPS | Progress: (20/20) | 9.19 s Done.
+
    [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 17/25]  Current/Best:   13.11/  18.79 GFLOPS | Progress: (4/20) | 4.71 s
    [Task 17/25]  Current/Best:   13.83/  23.24 GFLOPS | Progress: (8/20) | 7.63 s
    [Task 17/25]  Current/Best:   17.17/  23.24 GFLOPS | Progress: (12/20) | 9.70 s
    [Task 17/25]  Current/Best:   16.61/  23.24 GFLOPS | Progress: (16/20) | 11.90 s
    [Task 17/25]  Current/Best:   10.02/  23.24 GFLOPS | Progress: (20/20) | 14.08 s Done.
+
    [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 18/25]  Current/Best:   11.32/  17.62 GFLOPS | Progress: (4/20) | 3.76 s
    [Task 18/25]  Current/Best:   10.51/  19.59 GFLOPS | Progress: (8/20) | 7.41 s
    [Task 18/25]  Current/Best:   19.40/  19.59 GFLOPS | Progress: (12/20) | 9.32 s
    [Task 18/25]  Current/Best:    9.84/  19.59 GFLOPS | Progress: (16/20) | 13.25 s
    [Task 18/25]  Current/Best:   20.69/  20.69 GFLOPS | Progress: (20/20) | 14.77 s Done.
+
    [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 19/25]  Current/Best:    6.99/  20.25 GFLOPS | Progress: (4/20) | 6.15 s
    [Task 19/25]  Current/Best:    2.61/  20.25 GFLOPS | Progress: (8/20) | 9.50 s
    [Task 19/25]  Current/Best:   18.96/  20.68 GFLOPS | Progress: (12/20) | 12.46 s
    [Task 19/25]  Current/Best:   13.57/  20.68 GFLOPS | Progress: (16/20) | 15.48 s
    [Task 19/25]  Current/Best:    2.70/  23.00 GFLOPS | Progress: (20/20) | 18.27 s Done.
+
    [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 20/25]  Current/Best:    8.89/  14.89 GFLOPS | Progress: (4/20) | 3.36 s Done.
      Done.
-
    [Task 20/25]  Current/Best:    9.82/  15.05 GFLOPS | Progress: (8/20) | 6.82 s
    [Task 20/25]  Current/Best:    2.32/  16.51 GFLOPS | Progress: (12/20) | 10.80 s
    [Task 20/25]  Current/Best:   12.40/  16.51 GFLOPS | Progress: (16/20) | 14.50 s
    [Task 20/25]  Current/Best:   11.61/  22.12 GFLOPS | Progress: (20/20) | 16.61 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.42/  17.71 GFLOPS | Progress: (4/20) | 3.19 s
    [Task 21/25]  Current/Best:   14.68/  17.71 GFLOPS | Progress: (8/20) | 4.79 s
    [Task 21/25]  Current/Best:    1.61/  17.71 GFLOPS | Progress: (12/20) | 6.91 s
    [Task 21/25]  Current/Best:   18.19/  18.19 GFLOPS | Progress: (16/20) | 10.36 s
    [Task 21/25]  Current/Best:    4.46/  18.19 GFLOPS | Progress: (20/20) | 17.55 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.71/  17.01 GFLOPS | Progress: (4/20
 ) | 2.59 s
    [Task 22/25]  Current/Best:    8.76/  21.75 GFLOPS | Progress: (8/20) | 4.63 s
    [Task 22/25]  Current/Best:   20.02/  21.75 GFLOPS | Progress: (12/20) | 6.98 s
    [Task 22/25]  Current/Best:   15.30/  21.75 GFLOPS | Progress: (16/20) | 9.13 s
    [Task 22/25]  Current/Best:   14.45/  21.75 GFLOPS | Progress: (20/20) | 10.84 s Done.
-
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.68/  21.00 GFLOPS | Progress: (4/20) | 3.18 s
    [Task 23/25]  Current/Best:   13.70/  21.00 GFLOPS | Progress: (8/20) | 6.66 s
    [Task 23/25]  Current/Best:   21.02/  21.90 GFLOPS | Progress: (12/20) | 8.49 s
    [Task 23/25]  Current/Best:    6.51/  21.90 GFLOPS | Progress: (16/20) | 15.50 s
    [Task 23/25]  Current/Best:    7.96/  21.90 GFLOPS | Progress: (20/20) | 19.70 s Done.
-
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.52/   8.52 GFLOPS | Progress: (4/20) | 11.71 s
    [Task 24/25]  Current/Best:    3.77/   8.52 GFLOPS | Progress: (8/20) | 22.88 s
    [Task 24/25]  Current/Best:    4.48/   8.52 GFLOPS | Progress: (12/20) | 33.60 s Done.
+
    [Task 20/25]  Current/Best:   10.36/  14.89 GFLOPS | Progress: (8/20) | 6.78 s
    [Task 20/25]  Current/Best:    2.32/  16.68 GFLOPS | Progress: (12/20) | 10.78 s
    [Task 20/25]  Current/Best:   12.55/  16.68 GFLOPS | Progress: (16/20) | 14.79 s
    [Task 20/25]  Current/Best:   13.47/  21.67 GFLOPS | Progress: (20/20) | 16.92 s
    [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 21/25]  Current/Best:    6.37/  17.70 GFLOPS | Progress: (4/20) | 3.25 s
    [Task 21/25]  Current/Best:   14.62/  17.70 GFLOPS | Progress: (8/20) | 4.87 s
    [Task 21/25]  Current/Best:    1.61/  17.70 GFLOPS | Progress: (12/20) | 7.01 s
    [Task 21/25]  Current/Best:   17.13/  17.70 GFLOPS | Progress: (16/20) | 10.53 s
    [Task 21/25]  Current/Best:    4.47/  17.70 GFLOPS | Progress: (20/20) | 18.00 s
    [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 22/25]  Current/Best:    2.70/  17.05 GFLOPS | Progress: (4/20
 ) | 2.65 s
    [Task 22/25]  Current/Best:    8.66/  21.52 GFLOPS | Progress: (8/20) | 4.68 s
    [Task 22/25]  Current/Best:   19.76/  21.52 GFLOPS | Progress: (12/20) | 7.04 s
    [Task 22/25]  Current/Best:   15.32/  21.52 GFLOPS | Progress: (16/20) | 9.17 s
    [Task 22/25]  Current/Best:   13.02/  21.52 GFLOPS | Progress: (20/20) | 10.92 s Done.
+
    [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 23/25]  Current/Best:   17.32/  20.80 GFLOPS | Progress: (4/20) | 3.18 s
    [Task 23/25]  Current/Best:   15.77/  20.80 GFLOPS | Progress: (8/20) | 6.65 s
    [Task 23/25]  Current/Best:   20.67/  21.24 GFLOPS | Progress: (12/20) | 8.51 s
    [Task 23/25]  Current/Best:    6.22/  21.24 GFLOPS | Progress: (16/20) | 15.63 s
    [Task 23/25]  Current/Best:    7.49/  21.24 GFLOPS | Progress: (20/20) | 19.90 s Done.
+
    [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 24/25]  Current/Best:    8.27/   8.27 GFLOPS | Progress: (4/20) | 11.75 s
    [Task 24/25]  Current/Best:    3.61/   8.27 GFLOPS | Progress: (8/20) | 22.96 s
    [Task 24/25]  Current/Best:    4.31/   8.27 GFLOPS | Progress: (12/20) | 33.68 s Done.
      Done.
-
    [Task 24/25]  Current/Best:    6.24/   8.84 GFLOPS | Progress: (16/20) | 39.53 s
    [Task 24/25]  Current/Best:    3.41/   8.89 GFLOPS | Progress: (20/20) | 45.50 s Done.
-
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.75 GFLOPS | Progress: (4/20) | 11.53 s
    [Task 25/25]  Current/Best:    6.24/   8.03 GFLOPS | Progress: (8/20) | 22.76 s
    [Task 25/25]  Current/Best:    5.99/   8.03 GFLOPS | Progress: (12/20) | 34.01 s
    [Task 25/25]  Current/Best:    5.80/   8.79 GFLOPS | Progress: (16/20) | 35.87 s
    [Task 25/25]  Current/Best:    2.84/   9.40 GFLOPS | Progress: (20/20) | 46.56 s
+
    [Task 24/25]  Current/Best:    6.07/   8.81 GFLOPS | Progress: (16/20) | 39.46 s
    [Task 24/25]  Current/Best:    3.32/   8.81 GFLOPS | Progress: (20/20) | 45.51 s Done.
+
    [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
    [Task 25/25]  Current/Best:    1.55/   2.94 GFLOPS | Progress: (4/20) | 11.55 s
    [Task 25/25]  Current/Best:    5.78/   8.61 GFLOPS | Progress: (8/20) | 22.77 s
    [Task 25/25]  Current/Best:    5.99/   8.61 GFLOPS | Progress: (12/20) | 34.23 s
    [Task 25/25]  Current/Best:    5.87/   8.89 GFLOPS | Progress: (16/20) | 36.12 s
    [Task 25/25]  Current/Best:    2.84/   8.89 GFLOPS | Progress: (20/20) | 46.85 s
 
 
 The output from this tuning process will look something like this:
@@ -660,8 +660,8 @@ improvement in comparing the optimized model to the unoptimized model.
 
  .. code-block:: none
 
-    optimized: {'mean': 411.31273892002355, 'median': 411.2051228000382, 'std': 0.8036804155483971}
-    unoptimized: {'mean': 494.0058553899962, 'median': 493.83366904999093, 'std': 0.6866634716120491}
+    optimized: {'mean': 416.6099685100062, 'median': 416.9126989500228, 'std': 1.5312354032066489}
+    unoptimized: {'mean': 497.6316300500048, 'median': 497.7335548499923, 'std': 1.7081865449643931}
 
 
 
@@ -681,7 +681,7 @@ profiling/benchmarking.
 
 .. rst-class:: sphx-glr-timing
 
-   **Total running time of the script:** ( 10 minutes  20.581 seconds)
+   **Total running time of the script:** ( 10 minutes  25.060 seconds)
 
 
 .. _sphx_glr_download_tutorial_autotvm_relay_x86.py:
diff --git a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
index 73204dff7..9d57652ba 100644
--- a/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
+++ b/docs/_sources/tutorial/cross_compilation_and_rpc.rst.txt
@@ -235,7 +235,7 @@ device and returns the measured cost. Network overhead is excluded.
 
  .. code-block:: none
 
-    1.289e-07 secs/op
+    1.333e-07 secs/op
 
 
 
diff --git a/docs/_sources/tutorial/intro_topi.rst.txt b/docs/_sources/tutorial/intro_topi.rst.txt
index ff4b8dbd9..e1c0d3787 100644
--- a/docs/_sources/tutorial/intro_topi.rst.txt
+++ b/docs/_sources/tutorial/intro_topi.rst.txt
@@ -233,7 +233,7 @@ As you can see, scheduled stages of computation have been accumulated and we can
 
  .. code-block:: none
 
-    [stage(a, placeholder(a, 0xc364c90)), stage(b, placeholder(b, 0xc320ce0)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min= [...]
+    [stage(a, placeholder(a, 0x1b0ab4b0)), stage(b, placeholder(b, 0x22413b70)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(mi [...]
 
 
 
diff --git a/docs/_sources/tutorial/sg_execution_times.rst.txt b/docs/_sources/tutorial/sg_execution_times.rst.txt
index 1cbdc7526..3a8965328 100644
--- a/docs/_sources/tutorial/sg_execution_times.rst.txt
+++ b/docs/_sources/tutorial/sg_execution_times.rst.txt
@@ -5,17 +5,17 @@
 
 Computation times
 =================
-**13:19.803** total execution time for **tutorial** files:
+**13:08.446** total execution time for **tutorial** files:
 
-- **10:20.581**: :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)
-- **01:07.017**: :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``)
-- **00:59.416**: :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)
-- **00:27.995**: :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)
-- **00:23.214**: :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)
-- **00:00.715**: :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)
-- **00:00.536**: :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)
-- **00:00.191**: :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``)
-- **00:00.038**: :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)
-- **00:00.035**: :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)
-- **00:00.033**: :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)
-- **00:00.032**: :ref:`sphx_glr_tutorial_install.py` (``install.py``)
+- **10:25.060**: :ref:`sphx_glr_tutorial_autotvm_relay_x86.py` (``autotvm_relay_x86.py``)
+- **01:00.791**: :ref:`sphx_glr_tutorial_tensor_expr_get_started.py` (``tensor_expr_get_started.py``)
+- **00:49.167**: :ref:`sphx_glr_tutorial_auto_scheduler_matmul_x86.py` (``auto_scheduler_matmul_x86.py``)
+- **00:28.130**: :ref:`sphx_glr_tutorial_relay_quick_start.py` (``relay_quick_start.py``)
+- **00:23.603**: :ref:`sphx_glr_tutorial_autotvm_matmul_x86.py` (``autotvm_matmul_x86.py``)
+- **00:00.750**: :ref:`sphx_glr_tutorial_intro_topi.py` (``intro_topi.py``)
+- **00:00.568**: :ref:`sphx_glr_tutorial_tensor_ir_blitz_course.py` (``tensor_ir_blitz_course.py``)
+- **00:00.211**: :ref:`sphx_glr_tutorial_cross_compilation_and_rpc.py` (``cross_compilation_and_rpc.py``)
+- **00:00.043**: :ref:`sphx_glr_tutorial_introduction.py` (``introduction.py``)
+- **00:00.042**: :ref:`sphx_glr_tutorial_install.py` (``install.py``)
+- **00:00.041**: :ref:`sphx_glr_tutorial_tvmc_command_line_driver.py` (``tvmc_command_line_driver.py``)
+- **00:00.039**: :ref:`sphx_glr_tutorial_tvmc_python.py` (``tvmc_python.py``)
diff --git a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
index 5084405fe..e5c465f92 100644
--- a/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
+++ b/docs/_sources/tutorial/tensor_expr_get_started.rst.txt
@@ -252,8 +252,8 @@ helper function to run a profile of the TVM generated code.
 
  .. code-block:: none
 
-    Numpy running time: 0.000007
-    naive: 0.000008
+    Numpy running time: 0.000008
+    naive: 0.000006
 
 
 
@@ -344,7 +344,7 @@ compile and run this new schedule with the parallel operation applied:
 
  .. code-block:: none
 
-    parallel: 0.000009
+    parallel: 0.000006
 
 
 
@@ -447,10 +447,10 @@ We can now compare the different schedules
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                   numpy    7.089529999575461e-06                    1.0
-                   naive              7.9637e-06       1.123304365800961
-                parallel              8.8924e-06      1.2543003556699102
-                  vector    2.4648300000000003e-05     3.476718485072495
+                   numpy    8.059339997998904e-06                    1.0
+                   naive    5.967000000000001e-06     0.7403832077417728
+                parallel    6.3133999999999996e-06    0.7833643947975376
+                  vector    2.4610499999999998e-05    3.0536619631521527
 
 
 
@@ -839,7 +839,7 @@ matrix multiplication.
 
  .. code-block:: none
 
-    Numpy running time: 0.017892
+    Numpy running time: 0.018763
 
 
 
@@ -897,7 +897,7 @@ optimizations.
 
     /workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
       "target_host parameter is going to be deprecated. "
-    none: 3.300033
+    none: 3.382148
 
 
 
@@ -996,7 +996,7 @@ schedule.
 
  .. code-block:: none
 
-    blocking: 0.300489
+    blocking: 0.297685
 
 
 
@@ -1088,7 +1088,7 @@ already cache friendly from our previous optimizations.
 
  .. code-block:: none
 
-    vectorization: 0.339779
+    vectorization: 0.337189
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1160,7 +1160,7 @@ more cache friendly.
 
  .. code-block:: none
 
-    loop permutation: 0.112024
+    loop permutation: 0.119975
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1257,7 +1257,7 @@ optimized schedule.
 
  .. code-block:: none
 
-    array packing: 0.108253
+    array packing: 0.110986
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1348,7 +1348,7 @@ to `C` when all the block results are ready.
 
  .. code-block:: none
 
-    block caching: 0.110626
+    block caching: 0.111675
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1432,7 +1432,7 @@ of thread-level parallelization.
 
  .. code-block:: none
 
-    parallelization: 0.145039
+    parallelization: 0.144948
     @main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
       attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
       buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1511,13 +1511,13 @@ working, we can compare the results.
  .. code-block:: none
 
                 Operator                  Timing             Performance
-                    none      3.3000331350000005                     1.0
-                blocking            0.3004887988      0.0910562974695707
-           vectorization     0.33977934639999996     0.10296240446688724
-        loop permutation            0.1120241604    0.033946374420267746
-           array packing     0.10825315049999999    0.032803655621476045
-           block caching            0.1106262238    0.033522761522211196
-         parallelization            0.1450390134     0.04395077487608317
+                    none      3.3821478262999998                     1.0
+                blocking            0.2976846373      0.0880164477096972
+           vectorization            0.3371885208     0.09969656505785476
+        loop permutation     0.11997511350000001     0.03547305430207949
+           array packing     0.11098600200000001     0.03281524276879891
+           block caching     0.11167461740000002     0.03301884575582545
+         parallelization            0.1449481649    0.042856839010070806
 
 
 
@@ -1552,6 +1552,11 @@ operations with tunable parameters that allows you to automatically optimize
 the computation for specific platforms.
 
 
+.. rst-class:: sphx-glr-timing
+
+   **Total running time of the script:** ( 1 minutes  0.791 seconds)
+
+
 .. _sphx_glr_download_tutorial_tensor_expr_get_started.py:
 
 
diff --git a/docs/commit_hash b/docs/commit_hash
index 469b7e808..2c987802f 100644
--- a/docs/commit_hash
+++ b/docs/commit_hash
@@ -1 +1 @@
-6dbdf2e20116ecc6f5379f5cb430ed023ff0d62b
+f05ebde8e84e4bce620b0fdf839b89eb60c1008c
diff --git a/docs/how_to/compile_models/from_mxnet.html b/docs/how_to/compile_models/from_mxnet.html
index e48064915..fd63ec71e 100644
--- a/docs/how_to/compile_models/from_mxnet.html
+++ b/docs/how_to/compile_models/from_mxnet.html
@@ -401,7 +401,7 @@
 </div>
 <img alt="../../_images/sphx_glr_from_mxnet_001.png" class="sphx-glr-single-img" src="../../_images/sphx_glr_from_mxnet_001.png" />
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zip4b03aae9-9176-4f2b-b0fa-6faad5980f7a from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/resnet18_v1-a0666292.zipc9b717e6-28d6-45f6-8568-3018b30c9f29 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/resnet18_v1-a0666292.zip...
 x (1, 3, 224, 224)
 </pre></div>
 </div>
diff --git a/docs/how_to/compile_models/from_oneflow.html b/docs/how_to/compile_models/from_oneflow.html
index 6675ba10e..5fe8f25ee 100644
--- a/docs/how_to/compile_models/from_oneflow.html
+++ b/docs/how_to/compile_models/from_oneflow.html
@@ -406,43 +406,41 @@ python3 -m pip install -f https://release.oneflow.info <span class="nv">oneflow<
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://oneflow-public.oss-cn-beijing.aliyuncs.com/model_zoo/flowvision/classification/ResNet/resnet18.zip&quot; to /workspace/.oneflow/flowvision_cache/resnet18.zip
 
   0%|          | 0.00/41.5M [00:00&lt;?, ?B/s]
-  0%|          | 16.0k/41.5M [00:00&lt;08:15, 87.7kB/s]
-  0%|          | 48.0k/41.5M [00:00&lt;05:12, 139kB/s]
-  0%|          | 96.0k/41.5M [00:00&lt;03:42, 195kB/s]
-  0%|          | 160k/41.5M [00:00&lt;02:49, 256kB/s]
-  1%|          | 272k/41.5M [00:00&lt;01:52, 384kB/s]
-  1%|1         | 512k/41.5M [00:01&lt;01:01, 700kB/s]
-  2%|2         | 984k/41.5M [00:01&lt;00:32, 1.31MB/s]
-  5%|4         | 1.89M/41.5M [00:01&lt;00:16, 2.55MB/s]
-  8%|8         | 3.38M/41.5M [00:01&lt;00:09, 4.36MB/s]
- 12%|#1        | 4.87M/41.5M [00:01&lt;00:06, 5.58MB/s]
- 15%|#5        | 6.36M/41.5M [00:02&lt;00:05, 6.42MB/s]
- 19%|#8        | 7.84M/41.5M [00:02&lt;00:05, 7.00MB/s]
- 23%|##2       | 9.34M/41.5M [00:02&lt;00:04, 7.40MB/s]
- 26%|##6       | 10.8M/41.5M [00:02&lt;00:04, 7.68MB/s]
- 30%|##9       | 12.3M/41.5M [00:02&lt;00:03, 7.88MB/s]
- 33%|###3      | 13.8M/41.5M [00:02&lt;00:03, 8.01MB/s]
- 37%|###6      | 15.3M/41.5M [00:03&lt;00:03, 8.11MB/s]
- 40%|####      | 16.8M/41.5M [00:03&lt;00:03, 8.17MB/s]
- 44%|####4     | 18.3M/41.5M [00:03&lt;00:02, 8.21MB/s]
- 48%|####7     | 19.8M/41.5M [00:03&lt;00:02, 8.25MB/s]
- 51%|#####1    | 21.2M/41.5M [00:03&lt;00:02, 8.27MB/s]
- 55%|#####4    | 22.7M/41.5M [00:04&lt;00:02, 8.28MB/s]
- 58%|#####8    | 24.2M/41.5M [00:04&lt;00:02, 8.30MB/s]
- 62%|######1   | 25.7M/41.5M [00:04&lt;00:01, 8.30MB/s]
- 66%|######5   | 27.2M/41.5M [00:04&lt;00:01, 8.29MB/s]
- 69%|######9   | 28.7M/41.5M [00:04&lt;00:01, 8.28MB/s]
- 73%|#######2  | 30.1M/41.5M [00:05&lt;00:01, 8.29MB/s]
- 75%|#######4  | 31.1M/41.5M [00:05&lt;00:01, 7.26MB/s]
- 78%|#######8  | 32.6M/41.5M [00:05&lt;00:01, 7.58MB/s]
- 82%|########2 | 34.1M/41.5M [00:05&lt;00:00, 7.81MB/s]
- 85%|########5 | 35.4M/41.5M [00:06&lt;00:01, 5.98MB/s]
- 89%|########8 | 36.9M/41.5M [00:06&lt;00:00, 6.56MB/s]
- 91%|######### | 37.7M/41.5M [00:06&lt;00:00, 5.95MB/s]
- 94%|#########4| 39.2M/41.5M [00:06&lt;00:00, 6.59MB/s]
- 97%|#########6| 40.2M/41.5M [00:06&lt;00:00, 6.41MB/s]
- 98%|#########8| 40.9M/41.5M [00:06&lt;00:00, 5.56MB/s]
-100%|##########| 41.5M/41.5M [00:06&lt;00:00, 6.26MB/s]
+  0%|          | 16.0k/41.5M [00:00&lt;07:31, 96.3kB/s]
+  0%|          | 48.0k/41.5M [00:00&lt;04:45, 152kB/s]
+  0%|          | 96.0k/41.5M [00:00&lt;03:22, 214kB/s]
+  0%|          | 168k/41.5M [00:00&lt;02:24, 300kB/s]
+  1%|          | 352k/41.5M [00:00&lt;01:13, 589kB/s]
+  1%|1         | 616k/41.5M [00:01&lt;00:46, 926kB/s]
+  3%|2         | 1.22M/41.5M [00:01&lt;00:22, 1.86MB/s]
+  6%|5         | 2.45M/41.5M [00:01&lt;00:11, 3.66MB/s]
+ 10%|9         | 3.95M/41.5M [00:01&lt;00:07, 5.39MB/s]
+ 13%|#3        | 5.44M/41.5M [00:01&lt;00:05, 6.55MB/s]
+ 17%|#6        | 6.94M/41.5M [00:01&lt;00:04, 7.35MB/s]
+ 20%|##        | 8.44M/41.5M [00:02&lt;00:04, 7.90MB/s]
+ 24%|##3       | 9.92M/41.5M [00:02&lt;00:04, 8.27MB/s]
+ 28%|##7       | 11.4M/41.5M [00:02&lt;00:03, 8.54MB/s]
+ 31%|###1      | 12.9M/41.5M [00:02&lt;00:03, 8.72MB/s]
+ 35%|###4      | 14.4M/41.5M [00:02&lt;00:03, 8.82MB/s]
+ 38%|###8      | 15.9M/41.5M [00:02&lt;00:03, 8.92MB/s]
+ 42%|####1     | 17.4M/41.5M [00:03&lt;00:02, 8.99MB/s]
+ 45%|####5     | 18.9M/41.5M [00:03&lt;00:02, 9.03MB/s]
+ 49%|####9     | 20.4M/41.5M [00:03&lt;00:02, 9.05MB/s]
+ 53%|#####2    | 21.9M/41.5M [00:03&lt;00:02, 9.09MB/s]
+ 56%|#####6    | 23.3M/41.5M [00:03&lt;00:02, 9.09MB/s]
+ 60%|#####9    | 24.8M/41.5M [00:03&lt;00:01, 9.10MB/s]
+ 63%|######3   | 26.3M/41.5M [00:04&lt;00:01, 9.11MB/s]
+ 67%|######7   | 27.8M/41.5M [00:04&lt;00:01, 9.12MB/s]
+ 71%|#######   | 29.3M/41.5M [00:04&lt;00:01, 9.11MB/s]
+ 74%|#######4  | 30.8M/41.5M [00:04&lt;00:01, 9.13MB/s]
+ 78%|#######7  | 32.3M/41.5M [00:04&lt;00:01, 9.12MB/s]
+ 81%|########1 | 33.8M/41.5M [00:04&lt;00:00, 9.14MB/s]
+ 85%|########5 | 35.3M/41.5M [00:05&lt;00:00, 9.12MB/s]
+ 89%|########8 | 36.8M/41.5M [00:05&lt;00:00, 9.11MB/s]
+ 92%|#########2| 38.3M/41.5M [00:05&lt;00:00, 9.13MB/s]
+ 96%|#########5| 39.7M/41.5M [00:05&lt;00:00, 9.12MB/s]
+ 99%|#########9| 41.2M/41.5M [00:05&lt;00:00, 9.10MB/s]
+100%|##########| 41.5M/41.5M [00:05&lt;00:00, 7.47MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_paddle.html b/docs/how_to/compile_models/from_paddle.html
index 249c442e8..8f16cb950 100644
--- a/docs/how_to/compile_models/from_paddle.html
+++ b/docs/how_to/compile_models/from_paddle.html
@@ -469,7 +469,7 @@ A quick solution is</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>TVM prediction top-1 id: 282, class name:  282: &#39;tiger cat&#39;,
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  7.154 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  6.253 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-paddle-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/16269b77359771348d507395692524cf/from_paddle.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_paddle.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/from_pytorch.html b/docs/how_to/compile_models/from_pytorch.html
index ec5f15bf1..8f5e107f3 100644
--- a/docs/how_to/compile_models/from_pytorch.html
+++ b/docs/how_to/compile_models/from_pytorch.html
@@ -387,11 +387,10 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/resnet18-f37072fd.pth&quot; to /workspace/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
 
   0%|          | 0.00/44.7M [00:00&lt;?, ?B/s]
-  8%|7         | 3.53M/44.7M [00:00&lt;00:01, 36.7MB/s]
- 16%|#5        | 7.04M/44.7M [00:00&lt;00:01, 35.9MB/s]
- 41%|####1     | 18.3M/44.7M [00:00&lt;00:00, 72.4MB/s]
- 79%|#######8  | 35.1M/44.7M [00:00&lt;00:00, 113MB/s]
-100%|##########| 44.7M/44.7M [00:00&lt;00:00, 105MB/s]
+  9%|8         | 3.96M/44.7M [00:00&lt;00:01, 41.2MB/s]
+ 18%|#7        | 7.90M/44.7M [00:00&lt;00:00, 40.5MB/s]
+ 62%|######1   | 27.7M/44.7M [00:00&lt;00:00, 115MB/s]
+100%|##########| 44.7M/44.7M [00:00&lt;00:00, 120MB/s]
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/compile_models/from_tensorflow.html b/docs/how_to/compile_models/from_tensorflow.html
index 295127083..b8af42e69 100644
--- a/docs/how_to/compile_models/from_tensorflow.html
+++ b/docs/how_to/compile_models/from_tensorflow.html
@@ -612,6 +612,7 @@ banana (score = 0.00022)
 desk (score = 0.00019)
 </pre></div>
 </div>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  6.492 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-compile-models-from-tensorflow-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7f1d3d1b878694c201c614c807cdebc8/from_tensorflow.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">from_tensorflow.py</span></code></a></p>
diff --git a/docs/how_to/compile_models/sg_execution_times.html b/docs/how_to/compile_models/sg_execution_times.html
index ed50427ca..e4079b70b 100644
--- a/docs/how_to/compile_models/sg_execution_times.html
+++ b/docs/how_to/compile_models/sg_execution_times.html
@@ -300,18 +300,18 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-compile-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:19.617</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
+<p><strong>05:55.351</strong> total execution time for <strong>how_to_compile_models</strong> files:</p>
 <ul class="simple">
-<li><p><strong>01:07.154</strong>: <a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></li>
-<li><p><strong>00:59.786</strong>: <a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></li>
-<li><p><strong>00:57.244</strong>: <a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></li>
-<li><p><strong>00:32.067</strong>: <a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></li>
-<li><p><strong>00:23.948</strong>: <a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></li>
-<li><p><strong>00:22.611</strong>: <a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></li>
-<li><p><strong>00:20.958</strong>: <a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></li>
-<li><p><strong>00:19.583</strong>: <a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></li>
-<li><p><strong>00:13.801</strong>: <a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></li>
-<li><p><strong>00:02.463</strong>: <a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></li>
+<li><p><strong>01:06.492</strong>: <a class="reference internal" href="from_tensorflow.html#sphx-glr-how-to-compile-models-from-tensorflow-py"><span class="std std-ref">Compile Tensorflow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tensorflow.py</span></code>)</p></li>
+<li><p><strong>01:06.253</strong>: <a class="reference internal" href="from_paddle.html#sphx-glr-how-to-compile-models-from-paddle-py"><span class="std std-ref">Compile PaddlePaddle Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_paddle.py</span></code>)</p></li>
+<li><p><strong>00:57.441</strong>: <a class="reference internal" href="from_darknet.html#sphx-glr-how-to-compile-models-from-darknet-py"><span class="std std-ref">Compile YOLO-V2 and YOLO-V3 in DarkNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_darknet.py</span></code>)</p></li>
+<li><p><strong>00:38.879</strong>: <a class="reference internal" href="from_tflite.html#sphx-glr-how-to-compile-models-from-tflite-py"><span class="std std-ref">Compile TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_tflite.py</span></code>)</p></li>
+<li><p><strong>00:31.139</strong>: <a class="reference internal" href="from_oneflow.html#sphx-glr-how-to-compile-models-from-oneflow-py"><span class="std std-ref">Compile OneFlow Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_oneflow.py</span></code>)</p></li>
+<li><p><strong>00:28.658</strong>: <a class="reference internal" href="from_keras.html#sphx-glr-how-to-compile-models-from-keras-py"><span class="std std-ref">Compile Keras Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_keras.py</span></code>)</p></li>
+<li><p><strong>00:22.838</strong>: <a class="reference internal" href="from_mxnet.html#sphx-glr-how-to-compile-models-from-mxnet-py"><span class="std std-ref">Compile MXNet Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_mxnet.py</span></code>)</p></li>
+<li><p><strong>00:21.267</strong>: <a class="reference internal" href="from_coreml.html#sphx-glr-how-to-compile-models-from-coreml-py"><span class="std std-ref">Compile CoreML Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_coreml.py</span></code>)</p></li>
+<li><p><strong>00:19.939</strong>: <a class="reference internal" href="from_pytorch.html#sphx-glr-how-to-compile-models-from-pytorch-py"><span class="std std-ref">Compile PyTorch Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_pytorch.py</span></code>)</p></li>
+<li><p><strong>00:02.444</strong>: <a class="reference internal" href="from_onnx.html#sphx-glr-how-to-compile-models-from-onnx-py"><span class="std std-ref">Compile ONNX Models</span></a> (<code class="docutils literal notranslate"><span class="pre">from_onnx.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/deploy_models/deploy_model_on_android.html b/docs/how_to/deploy_models/deploy_model_on_android.html
index 0468832f8..826561e2a 100644
--- a/docs/how_to/deploy_models/deploy_model_on_android.html
+++ b/docs/how_to/deploy_models/deploy_model_on_android.html
@@ -627,7 +627,7 @@ to the remote android device.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  15.5115      15.4647      15.7794      15.4293       0.1051
+  15.6616      15.5844      15.9253      15.4984       0.1454
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
index db84ac035..7042198fc 100644
--- a/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
+++ b/docs/how_to/deploy_models/deploy_object_detection_pytorch.html
@@ -409,16 +409,15 @@ be unstable.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth&quot; to /workspace/.cache/torch/hub/checkpoints/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
 
   0%|          | 0.00/170M [00:00&lt;?, ?B/s]
-  2%|1         | 3.00M/170M [00:00&lt;00:05, 31.5MB/s]
-  4%|3         | 6.50M/170M [00:00&lt;00:04, 34.5MB/s]
- 16%|#6        | 27.8M/170M [00:00&lt;00:01, 120MB/s]
- 25%|##5       | 42.8M/170M [00:00&lt;00:00, 135MB/s]
- 40%|####      | 68.1M/170M [00:00&lt;00:00, 182MB/s]
- 50%|#####     | 85.4M/170M [00:00&lt;00:00, 167MB/s]
- 66%|######6   | 112M/170M [00:00&lt;00:00, 202MB/s]
- 78%|#######7  | 132M/170M [00:00&lt;00:00, 193MB/s]
- 89%|########8 | 150M/170M [00:00&lt;00:00, 184MB/s]
-100%|##########| 170M/170M [00:01&lt;00:00, 170MB/s]
+  2%|1         | 3.33M/170M [00:00&lt;00:05, 34.7MB/s]
+  4%|4         | 7.15M/170M [00:00&lt;00:04, 37.8MB/s]
+ 18%|#8        | 31.2M/170M [00:00&lt;00:01, 135MB/s]
+ 34%|###3      | 57.1M/170M [00:00&lt;00:00, 189MB/s]
+ 49%|####8     | 83.0M/170M [00:00&lt;00:00, 219MB/s]
+ 64%|######3   | 109M/170M [00:00&lt;00:00, 236MB/s]
+ 79%|#######9  | 134M/170M [00:00&lt;00:00, 247MB/s]
+ 94%|#########3| 160M/170M [00:00&lt;00:00, 253MB/s]
+100%|##########| 170M/170M [00:00&lt;00:00, 212MB/s]
 /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:3878: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
   for i in range(dim)
 /usr/local/lib/python3.7/dist-packages/torchvision/models/detection/anchor_utils.py:127: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the &#39;trunc&#39; function NOT &#39;floor&#39;). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode=&#39;trunc&#39;), or for actual floor division, use torch.div(a, b, rounding_mode=&#39;floor&#39;).
@@ -516,7 +515,7 @@ torchvision rcnn models.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Get 9 valid boxes
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  52.190 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  55.555 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-object-detection-pytorch-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7795da4b258c8feff986668b95ef57ad/deploy_object_detection_pytorch.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_object_detection_pytorch.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized.html b/docs/how_to/deploy_models/deploy_prequantized.html
index 832fd7478..57b0f1056 100644
--- a/docs/how_to/deploy_models/deploy_prequantized.html
+++ b/docs/how_to/deploy_models/deploy_prequantized.html
@@ -450,7 +450,9 @@ training. Other models require a full post training calibration.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading: &quot;https://download.pytorch.org/models/mobilenet_v2-b0353104.pth&quot; to /workspace/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth
 
   0%|          | 0.00/13.6M [00:00&lt;?, ?B/s]
-100%|##########| 13.6M/13.6M [00:00&lt;00:00, 170MB/s]
+ 27%|##7       | 3.69M/13.6M [00:00&lt;00:00, 38.4MB/s]
+ 54%|#####4    | 7.36M/13.6M [00:00&lt;00:00, 34.1MB/s]
+100%|##########| 13.6M/13.6M [00:00&lt;00:00, 56.7MB/s]
 </pre></div>
 </div>
 </div>
@@ -544,7 +546,7 @@ output values are identical out of 1000 outputs from mobilenet v2.</p>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  90.2020      90.1611      91.4664      89.9314       0.2113
+  90.6288      90.3101      109.3293     90.1389       1.9611
 </pre></div>
 </div>
 <div class="admonition note">
@@ -583,7 +585,7 @@ This includes support for the VNNI 8 bit dot product instruction (CascadeLake or
 <div class="section" id="deploy-a-quantized-tflite-model">
 <h2>Deploy a quantized TFLite Model<a class="headerlink" href="#deploy-a-quantized-tflite-model" title="Permalink to this headline">¶</a></h2>
 <p>TODO</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  5.324 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  7.105 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/fb8217c13f4351224c6cf3aacf1a87fc/deploy_prequantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_prequantized_tflite.html b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
index 2f62f5c6c..a02a7f236 100644
--- a/docs/how_to/deploy_models/deploy_prequantized_tflite.html
+++ b/docs/how_to/deploy_models/deploy_prequantized_tflite.html
@@ -545,7 +545,7 @@ TFLite Top-5 labels: [387 102 386 341 349]
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  117.2626     116.6744     133.9423     115.2772      2.1643
+  119.3850     119.3810     120.4711     118.5099      0.3716
 </pre></div>
 </div>
 <div class="admonition note">
@@ -573,7 +573,7 @@ network for ARM CPU</span></a>.</p></li>
 </ul>
 </div></blockquote>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  4.665 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  58.792 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-prequantized-tflite-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/56691c7a27d45da61d112276334640d3/deploy_prequantized_tflite.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_prequantized_tflite.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_quantized.html b/docs/how_to/deploy_models/deploy_quantized.html
index 41abcc1e0..54f7fe676 100644
--- a/docs/how_to/deploy_models/deploy_quantized.html
+++ b/docs/how_to/deploy_models/deploy_quantized.html
@@ -482,7 +482,7 @@ for calibration. But the accuracy might be impacted.</p>
   DeprecationWarning,
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  28.814 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  27.202 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-quantized-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/7810ecf51bfc05f7d5e8a400ac3e815d/deploy_quantized.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_quantized.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
index d2bf06e5f..c8afa9287 100644
--- a/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
+++ b/docs/how_to/deploy_models/deploy_ssd_gluoncv.html
@@ -415,22 +415,23 @@ to your device.</p>
 Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_512_resnet50_v1_voc-9c8b225a.zip...
 
   0%|          | 0/132723 [00:00&lt;?, ?KB/s]
-  5%|4         | 6076/132723 [00:00&lt;00:02, 60755.86KB/s]
- 11%|#1        | 14687/132723 [00:00&lt;00:01, 75651.01KB/s]
- 18%|#7        | 23341/132723 [00:00&lt;00:01, 80618.32KB/s]
- 24%|##4       | 32060/132723 [00:00&lt;00:01, 83209.49KB/s]
- 31%|###       | 40689/132723 [00:00&lt;00:01, 84317.29KB/s]
- 37%|###7      | 49378/132723 [00:00&lt;00:00, 85188.79KB/s]
- 44%|####3     | 57961/132723 [00:00&lt;00:00, 85397.29KB/s]
- 50%|#####     | 66554/132723 [00:00&lt;00:00, 85563.32KB/s]
- 57%|#####6    | 75244/132723 [00:00&lt;00:00, 85979.77KB/s]
- 63%|######3   | 83950/132723 [00:01&lt;00:00, 86310.28KB/s]
- 70%|######9   | 92634/132723 [00:01&lt;00:00, 86470.65KB/s]
- 76%|#######6  | 101328/132723 [00:01&lt;00:00, 86610.87KB/s]
- 83%|########2 | 110062/132723 [00:01&lt;00:00, 86830.63KB/s]
- 89%|########9 | 118777/132723 [00:01&lt;00:00, 86923.61KB/s]
- 96%|#########6| 127514/132723 [00:01&lt;00:00, 87055.62KB/s]
-100%|##########| 132723/132723 [00:01&lt;00:00, 85017.75KB/s]
+  4%|4         | 5835/132723 [00:00&lt;00:02, 58343.66KB/s]
+ 10%|#         | 13882/132723 [00:00&lt;00:01, 71356.41KB/s]
+ 17%|#6        | 21949/132723 [00:00&lt;00:01, 75603.29KB/s]
+ 23%|##2       | 30108/132723 [00:00&lt;00:01, 77964.00KB/s]
+ 29%|##8       | 37905/132723 [00:00&lt;00:01, 75858.11KB/s]
+ 35%|###4      | 46068/132723 [00:00&lt;00:01, 77782.99KB/s]
+ 41%|####      | 54213/132723 [00:00&lt;00:00, 78961.94KB/s]
+ 47%|####6     | 62282/132723 [00:00&lt;00:00, 79505.79KB/s]
+ 53%|#####3    | 70354/132723 [00:00&lt;00:00, 79880.70KB/s]
+ 59%|#####9    | 78485/132723 [00:01&lt;00:00, 80311.83KB/s]
+ 65%|######5   | 86542/132723 [00:01&lt;00:00, 80388.24KB/s]
+ 71%|#######1  | 94687/132723 [00:01&lt;00:00, 80709.03KB/s]
+ 77%|#######7  | 102760/132723 [00:01&lt;00:00, 79536.35KB/s]
+ 83%|########3 | 110719/132723 [00:01&lt;00:00, 79174.39KB/s]
+ 89%|########9 | 118640/132723 [00:01&lt;00:00, 73478.34KB/s]
+ 96%|#########5| 127131/132723 [00:01&lt;00:00, 76715.07KB/s]
+100%|##########| 132723/132723 [00:01&lt;00:00, 77785.71KB/s]
 </pre></div>
 </div>
 <p>Create TVM runtime and do inference
@@ -475,7 +476,7 @@ Downloading /workspace/.mxnet/models/ssd_512_resnet50_v1_voc-9c8b225a.zip from h
 </pre></div>
 </div>
 <img alt="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" class="sphx-glr-single-img" src="../../_images/sphx_glr_deploy_ssd_gluoncv_001.png" />
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  15.000 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  17.308 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-deploy-models-deploy-ssd-gluoncv-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/cccb17d28e5e8b2e94ea8cd5ec59f6ed/deploy_ssd_gluoncv.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">deploy_ssd_gluoncv.py</span></code></a></p>
diff --git a/docs/how_to/deploy_models/sg_execution_times.html b/docs/how_to/deploy_models/sg_execution_times.html
index 99c62fbe4..90efe6c6d 100644
--- a/docs/how_to/deploy_models/sg_execution_times.html
+++ b/docs/how_to/deploy_models/sg_execution_times.html
@@ -300,16 +300,16 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-deploy-models-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>10:35.912</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
+<p><strong>10:36.996</strong> total execution time for <strong>how_to_deploy_models</strong> files:</p>
 <ul class="simple">
-<li><p><strong>02:52.190</strong>: <a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></li>
-<li><p><strong>02:14.1000</strong>: <a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></li>
-<li><p><strong>02:04.665</strong>: <a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></li>
-<li><p><strong>01:28.814</strong>: <a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></li>
-<li><p><strong>01:05.324</strong>: <a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></li>
-<li><p><strong>00:27.679</strong>: <a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></li>
-<li><p><strong>00:22.056</strong>: <a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></li>
-<li><p><strong>00:00.184</strong>: <a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></li>
+<li><p><strong>02:55.555</strong>: <a class="reference internal" href="deploy_object_detection_pytorch.html#sphx-glr-how-to-deploy-models-deploy-object-detection-pytorch-py"><span class="std std-ref">Compile PyTorch Object Detection Models</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_object_detection_pytorch.py</span></code>)</p></li>
+<li><p><strong>02:17.308</strong>: <a class="reference internal" href="deploy_ssd_gluoncv.html#sphx-glr-how-to-deploy-models-deploy-ssd-gluoncv-py"><span class="std std-ref">Deploy Single Shot Multibox Detector(SSD) model</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_ssd_gluoncv.py</span></code>)</p></li>
+<li><p><strong>01:58.792</strong>: <a class="reference internal" href="deploy_prequantized_tflite.html#sphx-glr-how-to-deploy-models-deploy-prequantized-tflite-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM - Part 3 (TFLite)</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized_tflite.py</span></code>)</p></li>
+<li><p><strong>01:27.202</strong>: <a class="reference internal" href="deploy_quantized.html#sphx-glr-how-to-deploy-models-deploy-quantized-py"><span class="std std-ref">Deploy a Quantized Model on Cuda</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_quantized.py</span></code>)</p></li>
+<li><p><strong>01:07.105</strong>: <a class="reference internal" href="deploy_prequantized.html#sphx-glr-how-to-deploy-models-deploy-prequantized-py"><span class="std std-ref">Deploy a Framework-prequantized Model with TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_prequantized.py</span></code>)</p></li>
+<li><p><strong>00:28.431</strong>: <a class="reference internal" href="deploy_model_on_android.html#sphx-glr-how-to-deploy-models-deploy-model-on-android-py"><span class="std std-ref">Deploy the Pretrained Model on Android</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_android.py</span></code>)</p></li>
+<li><p><strong>00:22.401</strong>: <a class="reference internal" href="deploy_model_on_rasp.html#sphx-glr-how-to-deploy-models-deploy-model-on-rasp-py"><span class="std std-ref">Deploy the Pretrained Model on Raspberry Pi</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_model_on_rasp.py</span></code>)</p></li>
+<li><p><strong>00:00.202</strong>: <a class="reference internal" href="deploy_sparse.html#sphx-glr-how-to-deploy-models-deploy-sparse-py"><span class="std std-ref">Deploy a Hugging Face Pruned Model on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_sparse.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/extend_tvm/bring_your_own_datatypes.html b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
index 6c3b52fdb..bf4e83485 100644
--- a/docs/how_to/extend_tvm/bring_your_own_datatypes.html
+++ b/docs/how_to/extend_tvm/bring_your_own_datatypes.html
@@ -590,7 +590,7 @@ In this alpha state of the Bring Your Own Datatypes framework, we have not imple
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zipf0c0bb47-4970-4a0b-b2bc-ac6cf191dd17 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Downloading /workspace/.mxnet/models/mobilenet0.25-9f83e440.zip6607e628-6104-4e6e-bc94-96a761f538ed from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenet0.25-9f83e440.zip...
 </pre></div>
 </div>
 <p>It’s easy to execute MobileNet with native TVM:</p>
diff --git a/docs/how_to/extend_tvm/sg_execution_times.html b/docs/how_to/extend_tvm/sg_execution_times.html
index 606590ca5..0b4155f8e 100644
--- a/docs/how_to/extend_tvm/sg_execution_times.html
+++ b/docs/how_to/extend_tvm/sg_execution_times.html
@@ -300,12 +300,12 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-extend-tvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:38.104</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
+<p><strong>00:39.683</strong> total execution time for <strong>how_to_extend_tvm</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:34.660</strong>: <a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></li>
-<li><p><strong>00:02.226</strong>: <a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></li>
-<li><p><strong>00:01.033</strong>: <a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></li>
-<li><p><strong>00:00.185</strong>: <a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></li>
+<li><p><strong>00:35.811</strong>: <a class="reference internal" href="bring_your_own_datatypes.html#sphx-glr-how-to-extend-tvm-bring-your-own-datatypes-py"><span class="std std-ref">Bring Your Own Datatypes to TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">bring_your_own_datatypes.py</span></code>)</p></li>
+<li><p><strong>00:02.338</strong>: <a class="reference internal" href="use_pass_instrument.html#sphx-glr-how-to-extend-tvm-use-pass-instrument-py"><span class="std std-ref">How to Use TVM Pass Instrument</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_instrument.py</span></code>)</p></li>
+<li><p><strong>00:01.323</strong>: <a class="reference internal" href="use_pass_infra.html#sphx-glr-how-to-extend-tvm-use-pass-infra-py"><span class="std std-ref">How to Use TVM Pass Infra</span></a> (<code class="docutils literal notranslate"><span class="pre">use_pass_infra.py</span></code>)</p></li>
+<li><p><strong>00:00.211</strong>: <a class="reference internal" href="low_level_custom_pass.html#sphx-glr-how-to-extend-tvm-low-level-custom-pass-py"><span class="std std-ref">Writing a Customized Pass</span></a> (<code class="docutils literal notranslate"><span class="pre">low_level_custom_pass.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/extend_tvm/use_pass_instrument.html b/docs/how_to/extend_tvm/use_pass_instrument.html
index 8e74b0c17..b7f539954 100644
--- a/docs/how_to/extend_tvm/use_pass_instrument.html
+++ b/docs/how_to/extend_tvm/use_pass_instrument.html
@@ -486,10 +486,10 @@ profile the execution time of each passes.</p>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 6080us [6080us] (45.41%; 45.41%)
-FoldScaleAxis: 7310us [5us] (54.59%; 54.59%)
-        FoldConstant: 7305us [1471us] (54.56%; 99.93%)
-                InferType: 5834us [5834us] (43.57%; 79.86%)
+InferType: 6585us [6585us] (45.67%; 45.67%)
+FoldScaleAxis: 7835us [5us] (54.33%; 54.33%)
+        FoldConstant: 7829us [1572us] (54.29%; 99.93%)
+                InferType: 6257us [6257us] (43.39%; 79.93%)
 </pre></div>
 </div>
 </div>
@@ -512,10 +512,10 @@ Refer to following sections and <a class="reference internal" href="../../refere
 </div>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Printing results of timing profile...
-InferType: 5872us [5872us] (44.48%; 44.48%)
-FoldScaleAxis: 7329us [4us] (55.52%; 55.52%)
-        FoldConstant: 7325us [1552us] (55.49%; 99.94%)
-                InferType: 5773us [5773us] (43.73%; 78.82%)
+InferType: 6258us [6258us] (44.54%; 44.54%)
+FoldScaleAxis: 7794us [4us] (55.46%; 55.46%)
+        FoldConstant: 7789us [1607us] (55.43%; 99.94%)
+                InferType: 6183us [6183us] (44.00%; 79.37%)
 </pre></div>
 </div>
 <p>Register empty list to clear existing instruments.</p>
diff --git a/docs/how_to/optimize_operators/opt_conv_cuda.html b/docs/how_to/optimize_operators/opt_conv_cuda.html
index d92901fa0..cdb00ae42 100644
--- a/docs/how_to/optimize_operators/opt_conv_cuda.html
+++ b/docs/how_to/optimize_operators/opt_conv_cuda.html
@@ -534,7 +534,7 @@ latency of convolution.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 35.924908 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Convolution: 54.171603 ms
 </pre></div>
 </div>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-optimize-operators-opt-conv-cuda-py">
diff --git a/docs/how_to/optimize_operators/opt_conv_tensorcore.html b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
index 06d9c9921..6bec5503a 100644
--- a/docs/how_to/optimize_operators/opt_conv_tensorcore.html
+++ b/docs/how_to/optimize_operators/opt_conv_tensorcore.html
@@ -878,7 +878,7 @@ be able to run on our build server</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 13.361755 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>conv2d with tensor core: 8.954840 ms
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/optimize_operators/opt_gemm.html b/docs/how_to/optimize_operators/opt_gemm.html
index 9a5d2ef4d..723d6a580 100644
--- a/docs/how_to/optimize_operators/opt_gemm.html
+++ b/docs/how_to/optimize_operators/opt_gemm.html
@@ -431,8 +431,8 @@ Then we write a baseline implementation, the simplest way to write a matrix mult
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.017828
-Baseline: 3.377547
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018505
+Baseline: 3.443746
 </pre></div>
 </div>
 <p>In TVM, we can always inspect lower level IR to debug or optimize our schedule.
@@ -494,7 +494,7 @@ fill 32 * 32 * sizeof(float) which is 4KB in the cache whose total size is 32KB
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.296004
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt1: 0.306762
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -563,7 +563,7 @@ vastly.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.329275
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt2: 0.332624
 </pre></div>
 </div>
 <p>Here is the generated IR after vectorization.</p>
@@ -626,7 +626,7 @@ the access pattern for A matrix is more cache friendly.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.120399
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt3: 0.116654
 </pre></div>
 </div>
 <p>Here is the generated IR after loop permutation.</p>
@@ -711,7 +711,7 @@ flattening.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.110560
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt4: 0.110416
 </pre></div>
 </div>
 <p>Here is the generated IR after array packing.</p>
@@ -799,7 +799,7 @@ write to C when all the block results are ready.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.110722
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt5: 0.111222
 </pre></div>
 </div>
 <p>Here is the generated IR after blocking.</p>
@@ -891,7 +891,7 @@ write to C when all the block results are ready.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.145105
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Opt6: 0.145492
 </pre></div>
 </div>
 <p>Here is the generated IR after parallelization.</p>
diff --git a/docs/how_to/optimize_operators/sg_execution_times.html b/docs/how_to/optimize_operators/sg_execution_times.html
index 9c2b4c29d..1786f54d4 100644
--- a/docs/how_to/optimize_operators/sg_execution_times.html
+++ b/docs/how_to/optimize_operators/sg_execution_times.html
@@ -300,11 +300,11 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-optimize-operators-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:34.964</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
+<p><strong>00:35.245</strong> total execution time for <strong>how_to_optimize_operators</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:32.110</strong>: <a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></li>
-<li><p><strong>00:01.615</strong>: <a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></li>
-<li><p><strong>00:01.238</strong>: <a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></li>
+<li><p><strong>00:32.570</strong>: <a class="reference internal" href="opt_gemm.html#sphx-glr-how-to-optimize-operators-opt-gemm-py"><span class="std std-ref">How to optimize GEMM on CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_gemm.py</span></code>)</p></li>
+<li><p><strong>00:01.459</strong>: <a class="reference internal" href="opt_conv_tensorcore.html#sphx-glr-how-to-optimize-operators-opt-conv-tensorcore-py"><span class="std std-ref">How to optimize convolution using TensorCores</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_tensorcore.py</span></code>)</p></li>
+<li><p><strong>00:01.216</strong>: <a class="reference internal" href="opt_conv_cuda.html#sphx-glr-how-to-optimize-operators-opt-conv-cuda-py"><span class="std std-ref">How to optimize convolution on GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">opt_conv_cuda.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
index 1e053ced4..19f84dfbe 100644
--- a/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
+++ b/docs/how_to/tune_with_autoscheduler/sg_execution_times.html
@@ -300,14 +300,14 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autoscheduler-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>05:16.371</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
+<p><strong>05:16.489</strong> total execution time for <strong>how_to_tune_with_autoscheduler</strong> files:</p>
 <ul class="simple">
-<li><p><strong>02:37.623</strong>: <a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></li>
-<li><p><strong>01:19.340</strong>: <a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></li>
-<li><p><strong>00:42.014</strong>: <a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></li>
-<li><p><strong>00:19.946</strong>: <a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></li>
-<li><p><strong>00:09.090</strong>: <a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></li>
-<li><p><strong>00:08.359</strong>: <a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></li>
+<li><p><strong>02:38.683</strong>: <a class="reference internal" href="tune_conv2d_layer_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py"><span class="std std-ref">Auto-scheduling a Convolution Layer for GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_layer_cuda.py</span></code>)</p></li>
+<li><p><strong>01:20.354</strong>: <a class="reference internal" href="tune_network_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-x86-py"><span class="std std-ref">Auto-scheduling a Neural Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_x86.py</span></code>)</p></li>
+<li><p><strong>00:42.530</strong>: <a class="reference internal" href="tune_network_cuda.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-cuda-py"><span class="std std-ref">Auto-scheduling a Neural Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_cuda.py</span></code>)</p></li>
+<li><p><strong>00:17.587</strong>: <a class="reference internal" href="tune_sparse_x86.html#sphx-glr-how-to-tune-with-autoscheduler-tune-sparse-x86-py"><span class="std std-ref">Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_sparse_x86.py</span></code>)</p></li>
+<li><p><strong>00:08.860</strong>: <a class="reference internal" href="tune_network_mali.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-mali-py"><span class="std std-ref">Auto-scheduling a Neural Network for mali GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_mali.py</span></code>)</p></li>
+<li><p><strong>00:08.475</strong>: <a class="reference internal" href="tune_network_arm.html#sphx-glr-how-to-tune-with-autoscheduler-tune-network-arm-py"><span class="std std-ref">Auto-scheduling a Neural Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_network_arm.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
index 4142c4fc7..bdfd93eb2 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_conv2d_layer_cuda.html
@@ -984,7 +984,7 @@ cooperative fetching, unrolling and operator fusion.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.370 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 0.350 ms
 </pre></div>
 </div>
 </div>
@@ -1549,7 +1549,7 @@ In the example below we resume the status and do more 5 trials.</p>
 Get devices for measurement successfully!
 </pre></div>
 </div>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  37.623 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 2 minutes  38.683 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-conv2d-layer-cuda-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e3e540f3b477c0c52d8eb73e674e8ffd/tune_conv2d_layer_cuda.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_conv2d_layer_cuda.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
index efcfcbf6e..e88b748ae 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_cuda.html
@@ -878,7 +878,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  10.0358      10.0072      10.1016       9.9986       0.0467
+   9.6087       9.6029       9.6330       9.5902       0.0179
 </pre></div>
 </div>
 </div>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
index 5e394bf46..7ac27acb9 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_network_x86.html
@@ -897,7 +897,7 @@ so we can read the log file and load the best schedules.</p>
 Evaluate inference time cost...
 Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
-  753.0981     753.2482     753.2847     752.7613      0.2386
+  758.6266     758.8503     761.0661     755.9634      2.0892
 </pre></div>
 </div>
 </div>
@@ -919,7 +919,7 @@ to learn how to use the RPC Tracker and RPC Server.
 To use the RPC Tracker in auto-scheduler, replace the runner in <code class="code docutils literal notranslate"><span class="pre">TuningOptions</span></code>
 with <a class="reference internal" href="../../reference/api/python/auto_scheduler.html#tvm.auto_scheduler.RPCRunner" title="tvm.auto_scheduler.RPCRunner"><code class="xref any py py-class docutils literal notranslate"><span class="pre">auto_scheduler.RPCRunner</span></code></a>.</p></li>
 </ol>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  19.340 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  20.354 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autoscheduler-tune-network-x86-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../../_downloads/e416b94ca1090b0897c0f6e0df95b911/tune_network_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tune_network_x86.py</span></code></a></p>
diff --git a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
index 73d1a4253..04c1633fd 100644
--- a/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
+++ b/docs/how_to/tune_with_autoscheduler/tune_sparse_x86.html
@@ -600,7 +600,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
              placeholder_4: Buffer(placeholder_14: Pointer(float32), float32, [65536], []),
              compute: Buffer(compute_2: Pointer(float32), float32, [65536], [])}
   buffer_map = {placeholder_5: placeholder, placeholder_6: placeholder_1, placeholder_7: placeholder_2, placeholder_8: placeholder_3, placeholder_9: placeholder_4, compute_1: compute}
-  preflattened_buffer_map = {placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_7: placeholder_19: Buffer(placeholder_12, int32, [4916], []), compute_1: compute_3: Buffer(compute_2, float32, [128, 512], [])} {
+  preflattened_buffer_map = {compute_1: compute_3: Buffer(compute_2, float32, [128, 512], []), placeholder_8: placeholder_15: Buffer(placeholder_13, int32, [33], []), placeholder_5: placeholder_16: Buffer(placeholder_10, float32, [128, 256], []), placeholder_6: placeholder_17: Buffer(placeholder_11, float32, [4916, 16, 1], []), placeholder_9: placeholder_18: Buffer(placeholder_14, float32, [128, 512], []), placeholder_7: placeholder_19: Buffer(placeholder_12, int32, [4916], [])} {
   for (i0.outer.i1.outer.fused: int32, 0, 32) &quot;parallel&quot; {
     allocate(compute_4: Pointer(global float32), float32, [2048]), storage_scope = global {
       for (nb_j.inner: int32, 0, 2) {
@@ -708,7 +708,7 @@ layout transformation, parallelization, vectorization, unrolling, and operator f
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 1.847 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 1.838 ms
 </pre></div>
 </div>
 <div class="admonition note">
diff --git a/docs/how_to/tune_with_autotvm/sg_execution_times.html b/docs/how_to/tune_with_autotvm/sg_execution_times.html
index 4d42b77dc..aaed1860c 100644
--- a/docs/how_to/tune_with_autotvm/sg_execution_times.html
+++ b/docs/how_to/tune_with_autotvm/sg_execution_times.html
@@ -300,13 +300,13 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-tune-with-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:44.445</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
+<p><strong>00:44.175</strong> total execution time for <strong>how_to_tune_with_autotvm</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:43.616</strong>: <a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></li>
-<li><p><strong>00:00.213</strong>: <a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></li>
-<li><p><strong>00:00.213</strong>: <a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></li>
-<li><p><strong>00:00.203</strong>: <a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></li>
-<li><p><strong>00:00.199</strong>: <a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></li>
+<li><p><strong>00:43.281</strong>: <a class="reference internal" href="tune_conv2d_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-conv2d-cuda-py"><span class="std std-ref">Tuning High Performance Convolution on NVIDIA GPUs</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_conv2d_cuda.py</span></code>)</p></li>
+<li><p><strong>00:00.232</strong>: <a class="reference internal" href="tune_relay_x86.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-x86-py"><span class="std std-ref">Auto-tuning a Convolutional Network for x86 CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_x86.py</span></code>)</p></li>
+<li><p><strong>00:00.228</strong>: <a class="reference internal" href="tune_relay_cuda.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-cuda-py"><span class="std std-ref">Auto-tuning a Convolutional Network for NVIDIA GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_cuda.py</span></code>)</p></li>
+<li><p><strong>00:00.218</strong>: <a class="reference internal" href="tune_relay_arm.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-arm-py"><span class="std std-ref">Auto-tuning a Convolutional Network for ARM CPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_arm.py</span></code>)</p></li>
+<li><p><strong>00:00.217</strong>: <a class="reference internal" href="tune_relay_mobile_gpu.html#sphx-glr-how-to-tune-with-autotvm-tune-relay-mobile-gpu-py"><span class="std std-ref">Auto-tuning a Convolutional Network for Mobile GPU</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_mobile_gpu.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
index f7a796356..38d5d8f50 100644
--- a/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
+++ b/docs/how_to/tune_with_autotvm/tune_conv2d_cuda.html
@@ -1142,8 +1142,8 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 4, 32]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 1, 128]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2885496
-No: 6   GFLOPS: 97.01/97.01     result: MeasureResult(costs=(0.002386400625,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.612271785736084, timestamp=1654294869.017783)       [(&#39;tile_f&#39;, [-1, 1, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,3754080
-No: 7   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 6   GFLOPS: 42.39/42.39     result: MeasureResult(costs=(0.005461075315789474,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.610039234161377, timestamp=1654298603.3978002)        [(&#39;tile_f&#39;, [-1, 1, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 4]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,3754080
+No: 7   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1266,7 +1266,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 16, 32]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 256, 1]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6225319
-No: 8   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 8   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1389,7 +1389,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 1, 32]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 64]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,943546
-No: 9   GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 9   GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1512,7 +1512,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 4, 16, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 16, 32]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2868708
-No: 10  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 10  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 142, in build
     res = future.result()
   File &quot;/usr/lib/python3.7/concurrent/futures/_base.py&quot;, line 435, in result
@@ -1530,7 +1530,7 @@ No: 10  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
 TimeoutError
 
         [(&#39;tile_f&#39;, [-1, 32, 2, 4]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 4, 2]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4691833
-No: 11  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 11  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1653,7 +1653,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 2, 64]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 4]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,1042124
-No: 12  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 12  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1776,7 +1776,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 32, 1, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 32, 16]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,10013405
-No: 13  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 13  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -1899,7 +1899,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 8, 8, 2]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 7, 1]), (&#39;tile_rc&#39;, [-1, 4, 32]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6732082
-No: 14  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 14  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -2022,7 +2022,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 4, 32]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 1, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 128]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 1)],None,7536735
-No: 15  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 15  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -2145,7 +2145,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 1, 4]), (&#39;tile_y&#39;, [-1, 1, 1, 7]), (&#39;tile_x&#39;, [-1, 1, 1, 7]), (&#39;tile_rc&#39;, [-1, 128, 4]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 1, 1]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 0)],None,482121
-No: 16  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 16  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -2268,7 +2268,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 2, 1, 16]), (&#39;tile_y&#39;, [-1, 1, 7, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 32, 8]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 512), (&#39;unroll_explicit&#39;, 0)],None,2824525
-No: 17  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 17  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -2391,7 +2391,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 64, 1, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 8, 8]), (&#39;tile_ry&#39;, [-1, 1, 3]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 0)],None,4559286
-No: 18  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 18  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 571, in __call__
     func, arg_info = _build_func_common(measure_input, self.runtime, **kwargs)
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 523, in _build_func_common
@@ -2514,7 +2514,7 @@ Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 854, in verify_pass
     raise InstantiationError(&quot;Skipped because of invalid gpu kernel&quot;)
 tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel        [(&#39;tile_f&#39;, [-1, 1, 32, 16]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 512]), (&#39;tile_ry&#39;, [-1, 3, 1]), (&#39;tile_rx&#39;, [-1, 3, 1]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9677544
-No: 19  GFLOPS: 0.00/97.01      result: Traceback (most recent call last):
+No: 19  GFLOPS: 0.00/42.39      result: Traceback (most recent call last):
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 721, in __call__
     yield remote, remote.load_module(os.path.split(build_result.filename)[1])
   File &quot;/workspace/python/tvm/autotvm/measure/measure_methods.py&quot;, line 685, in run_through_rpc
@@ -2602,7 +2602,7 @@ tvm._ffi.base.TVMError: Traceback (most recent call last):
   15: _PyEval_EvalFrameDefault
   14: 0x0000000000537c30
   13: _PyObject_FastCallKeywords
-  12: 0x00007f602848afa2
+  12: 0x00007f37cee58fa2
   11: _ctypes_callproc
   10: ffi_call
   9: ffi_call_unix64
@@ -2667,7 +2667,7 @@ Traceback (most recent call last):
   21: _PyFunction_FastCallKeywords
   20: _PyEval_EvalFrameDefault
   19: _PyFunction_FastCall      [(&#39;tile_f&#39;, [-1, 8, 2, 16]), (&#39;tile_y&#39;, [-1, 7, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 1, 1]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 0), (&#39;unroll_explicit&#39;, 1)],None,6390073
-No: 20  GFLOPS: 142.13/142.13   result: MeasureResult(costs=(0.0016287441699999999,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4211044311523438, timestamp=1654294895.6017895)      [(&#39;tile_f&#39;, [-1, 1, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9881539
+No: 20  GFLOPS: 145.03/145.03   result: MeasureResult(costs=(0.0015962098,), error_no=MeasureErrorNo.NO_ERROR, all_cost=1.4588935375213623, timestamp=1654298629.9176986)       [(&#39;tile_f&#39;, [-1, 1, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9881539
 </pre></div>
 </div>
 <p>Finally we can inspect the best config from log file, check correctness,
@@ -2706,7 +2706,7 @@ and measure running time.</p>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Best config:
 [(&#39;tile_f&#39;, [-1, 1, 4, 1]), (&#39;tile_y&#39;, [-1, 1, 1, 1]), (&#39;tile_x&#39;, [-1, 7, 1, 1]), (&#39;tile_rc&#39;, [-1, 4, 1]), (&#39;tile_ry&#39;, [-1, 1, 1]), (&#39;tile_rx&#39;, [-1, 1, 3]), (&#39;auto_unroll_max_step&#39;, 1500), (&#39;unroll_explicit&#39;, 1)],None,9881539
-Time cost of this operator: 0.002030
+Time cost of this operator: 0.002019
 </pre></div>
 </div>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-tune-with-autotvm-tune-conv2d-cuda-py">
diff --git a/docs/how_to/work_with_microtvm/index.html b/docs/how_to/work_with_microtvm/index.html
index 01e022a54..d9dd4cc51 100644
--- a/docs/how_to/work_with_microtvm/index.html
+++ b/docs/how_to/work_with_microtvm/index.html
@@ -215,6 +215,7 @@
 <li class="toctree-l3"><a class="reference internal" href="micro_ethosu.html">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_reference_vm.html">microTVM Reference Virtual Machines</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
 </ul>
 </li>
@@ -352,9 +353,15 @@ demonstrate how to tune and deploy models with microTVM.</p>
 </div>
 </div><div class="toctree-wrapper compound">
 </div>
-<div class="sphx-glr-thumbcontainer" tooltip="This tutorial explains how to compile a tiny model for a micro device, build a program on Zephy..."><div class="figure align-default" id="id5">
+<div class="sphx-glr-thumbcontainer" tooltip="This tutorial shows how MobileNetV1 models can be trained to fit on embedded devices, and how t..."><div class="figure align-default" id="id5">
+<img alt="../../_images/sphx_glr_micro_train_thumb.png" src="../../_images/sphx_glr_micro_train_thumb.png" />
+<p class="caption"><span class="caption-text"><a class="reference internal" href="micro_train.html#sphx-glr-how-to-work-with-microtvm-micro-train-py"><span class="std std-ref">Training Vision Models for microTVM on Arduino</span></a></span><a class="headerlink" href="#id5" title="Permalink to this image">¶</a></p>
+</div>
+</div><div class="toctree-wrapper compound">
+</div>
+<div class="sphx-glr-thumbcontainer" tooltip="This tutorial explains how to compile a tiny model for a micro device, build a program on Zephy..."><div class="figure align-default" id="id6">
 <img alt="../../_images/sphx_glr_micro_tvmc_thumb.png" src="../../_images/sphx_glr_micro_tvmc_thumb.png" />
-<p class="caption"><span class="caption-text"><a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a></span><a class="headerlink" href="#id5" title="Permalink to this image">¶</a></p>
+<p class="caption"><span class="caption-text"><a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a></span><a class="headerlink" href="#id6" title="Permalink to this image">¶</a></p>
 </div>
 </div><div class="toctree-wrapper compound">
 </div>
diff --git a/docs/how_to/work_with_microtvm/micro_autotune.html b/docs/how_to/work_with_microtvm/micro_autotune.html
index 7dd982bf7..c29b605bb 100644
--- a/docs/how_to/work_with_microtvm/micro_autotune.html
+++ b/docs/how_to/work_with_microtvm/micro_autotune.html
@@ -224,6 +224,7 @@
 <li class="toctree-l3"><a class="reference internal" href="micro_ethosu.html">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_reference_vm.html">microTVM Reference Virtual Machines</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
 </ul>
 </li>
@@ -555,10 +556,10 @@ the tuned operator.</p>
 ########## Build without Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs
 ---------                                     ---                                           --------  -------  -----              ------  -------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  313.4     98.748   (1, 2, 10, 10, 3)  2       1
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.042     0.959    (1, 6, 10, 10)     1       1
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.93      0.293    (1, 1, 10, 10, 3)  1       1
-Total_time                                    -                                             317.372   -        -                  -       -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  314.7     98.745   (1, 2, 10, 10, 3)  2       1
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       3.076     0.965    (1, 6, 10, 10)     1       1
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.924     0.29     (1, 1, 10, 10, 3)  1       1
+Total_time                                    -                                             318.7     -        -                  -       -
 </pre></div>
 </div>
 </div>
@@ -610,10 +611,10 @@ Total_time                                    -
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>########## Build with Autotuning ##########
 Node Name                                     Ops                                           Time(us)  Time(%)  Shape              Inputs  Outputs
 ---------                                     ---                                           --------  -------  -----              ------  -------
-tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  151.6     98.264   (1, 6, 10, 10, 1)  2       1
-tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.74      1.128    (1, 6, 10, 10)     1       1
-tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.939     0.609    (1, 1, 10, 10, 3)  1       1
-Total_time                                    -                                             154.279   -        -                  -       -
+tvmgen_default_fused_nn_contrib_conv2d_NCHWc  tvmgen_default_fused_nn_contrib_conv2d_NCHWc  133.0     97.979   (1, 6, 10, 10, 1)  2       1
+tvmgen_default_fused_layout_transform_1       tvmgen_default_fused_layout_transform_1       1.838     1.354    (1, 6, 10, 10)     1       1
+tvmgen_default_fused_layout_transform         tvmgen_default_fused_layout_transform         0.905     0.667    (1, 1, 10, 10, 3)  1       1
+Total_time                                    -                                             135.744   -        -                  -       -
 </pre></div>
 </div>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-autotune-py">
diff --git a/docs/how_to/work_with_microtvm/micro_ethosu.html b/docs/how_to/work_with_microtvm/micro_ethosu.html
index 6f78194df..85665bc61 100644
--- a/docs/how_to/work_with_microtvm/micro_ethosu.html
+++ b/docs/how_to/work_with_microtvm/micro_ethosu.html
@@ -230,6 +230,7 @@
 </li>
 <li class="toctree-l3"><a class="reference internal" href="micro_reference_vm.html">microTVM Reference Virtual Machines</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
 </ul>
 </li>
diff --git a/docs/how_to/work_with_microtvm/micro_reference_vm.html b/docs/how_to/work_with_microtvm/micro_reference_vm.html
index c60dbd8ce..c4b2102a4 100644
--- a/docs/how_to/work_with_microtvm/micro_reference_vm.html
+++ b/docs/how_to/work_with_microtvm/micro_reference_vm.html
@@ -220,6 +220,7 @@
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
 </ul>
 </li>
diff --git a/docs/how_to/work_with_microtvm/micro_tflite.html b/docs/how_to/work_with_microtvm/micro_tflite.html
index dd21dcaf1..055887f8a 100644
--- a/docs/how_to/work_with_microtvm/micro_tflite.html
+++ b/docs/how_to/work_with_microtvm/micro_tflite.html
@@ -45,7 +45,7 @@
     <script type="text/javascript" src="../../_static/js/tlcpack_theme.js"></script>
     <link rel="index" title="Index" href="../../genindex.html" />
     <link rel="search" title="Search" href="../../search.html" />
-    <link rel="next" title="Executing a Tiny Model with TVMC Micro" href="micro_tvmc.html" />
+    <link rel="next" title="Training Vision Models for microTVM on Arduino" href="micro_train.html" />
     <link rel="prev" title="microTVM Reference Virtual Machines" href="micro_reference_vm.html" /> 
 </head>
 
@@ -220,6 +220,7 @@
 <li class="toctree-l4"><a class="reference internal" href="#defining-the-target">Defining the target</a></li>
 </ul>
 </li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
 </ul>
 </li>
@@ -681,7 +682,7 @@ to stand in for an attached microcontroller.</p>
 
     <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
       
-        <a href="micro_tvmc.html" class="btn btn-neutral float-right" title="Executing a Tiny Model with TVMC Micro" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
+        <a href="micro_train.html" class="btn btn-neutral float-right" title="Training Vision Models for microTVM on Arduino" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
       
       
         <a href="micro_reference_vm.html" class="btn btn-neutral float-left" title="microTVM Reference Virtual Machines" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
diff --git a/docs/how_to/work_with_microtvm/micro_train.html b/docs/how_to/work_with_microtvm/micro_train.html
new file mode 100644
index 000000000..7487eaeb7
--- /dev/null
+++ b/docs/how_to/work_with_microtvm/micro_train.html
@@ -0,0 +1,1046 @@
+
+
+
+
+
+
+<!DOCTYPE html>
+<html class="writer-html5" lang="en" >
+<head>
+  <meta charset="utf-8">
+  
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  
+  <title>Training Vision Models for microTVM on Arduino &mdash; tvm 0.9.dev0 documentation</title>
+  
+
+  
+  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
+  <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/gallery.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/pygments.css" type="text/css" />
+  <link rel="stylesheet" href="../../_static/css/tlcpack_theme.css" type="text/css" />
+
+  
+  
+    <link rel="shortcut icon" href="../../_static/tvm-logo-square.png"/>
+  
+
+  
+  
+  
+  
+    
+      <script type="text/javascript" id="documentation_options" data-url_root="../../" src="../../_static/documentation_options.js"></script>
+        <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
+        <script src="../../_static/jquery.js"></script>
+        <script src="../../_static/underscore.js"></script>
+        <script src="../../_static/doctools.js"></script>
+    
+    <script type="text/javascript" src="../../_static/js/theme.js"></script>
+
+    
+    <script type="text/javascript" src="../../_static/js/tlcpack_theme.js"></script>
+    <link rel="index" title="Index" href="../../genindex.html" />
+    <link rel="search" title="Search" href="../../search.html" />
+    <link rel="next" title="Executing a Tiny Model with TVMC Micro" href="micro_tvmc.html" />
+    <link rel="prev" title="microTVM with TFLite Models" href="micro_tflite.html" /> 
+</head>
+
+<body class="wy-body-for-nav">
+
+   
+  <div class="wy-grid-for-nav">
+    
+    
+<header class="header">
+    <div class="innercontainer">
+      <div class="headerInner d-flex justify-content-between align-items-center">
+          <div class="headerLogo">
+               <a href="https://tvm.apache.org/"><img src=https://tvm.apache.org/assets/images/logo.svg alt="logo"></a>
+          </div>
+
+          <div id="headMenu" class="headerNav">
+            <button type="button" id="closeHeadMenu" class="navCloseBtn"><img src="../../_static/img/close-icon.svg" alt="Close"></button>
+             <ul class="nav">
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvm.apache.org/community>Community</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvm.apache.org/download>Download</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvm.apache.org/vta>VTA</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvm.apache.org/blog>Blog</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvm.apache.org/docs>Docs</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://tvmconf.org>Conference</a>
+                </li>
+                <li class="nav-item">
+                   <a class="nav-link" href=https://github.com/apache/tvm/>Github</a>
+                </li>
+             </ul>
+               <div class="responsivetlcdropdown">
+                 <button type="button" class="btn-link">
+                   ASF
+                 </button>
+                 <ul>
+                     <li>
+                       <a href=https://apache.org/>Apache Homepage</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/licenses/>License</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/foundation/sponsorship.html>Sponsorship</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/security/>Security</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/foundation/thanks.html>Thanks</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/events/current-event>Events</a>
+                     </li>
+                 </ul>
+               </div>
+          </div>
+            <div class="responsiveMenuIcon">
+              <button type="button" id="menuBtn" class="btn-menu"><img src="../../_static/img/menu-icon.svg" alt="Menu Icon"></button>
+            </div>
+
+            <div class="tlcDropdown">
+              <div class="dropdown">
+                <button type="button" class="btn-link dropdown-toggle" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
+                  ASF
+                </button>
+                <div class="dropdown-menu dropdown-menu-right">
+                  <ul>
+                     <li>
+                       <a href=https://apache.org/>Apache Homepage</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/licenses/>License</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/foundation/sponsorship.html>Sponsorship</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/security/>Security</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/foundation/thanks.html>Thanks</a>
+                     </li>
+                     <li>
+                       <a href=https://www.apache.org/events/current-event>Events</a>
+                     </li>
+                  </ul>
+                </div>
+              </div>
+          </div>
+       </div>
+    </div>
+ </header>
+ 
+    <nav data-toggle="wy-nav-shift" class="wy-nav-side fixed">
+      <div class="wy-side-scroll">
+        <div class="wy-side-nav-search" >
+          
+
+          
+            <a href="../../index.html">
+          
+
+          
+            
+            <img src="../../_static/tvm-logo-small.png" class="logo" alt="Logo"/>
+          
+          </a>
+
+          
+            
+            
+                <div class="version">
+                  0.9.dev0
+                </div>
+            
+          
+
+          
+<div role="search">
+  <form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
+    <input type="text" name="q" placeholder="Search docs" />
+    <input type="hidden" name="check_keywords" value="yes" />
+    <input type="hidden" name="area" value="default" />
+  </form>
+</div>
+
+          
+        </div>
+
+        
+        <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
+          
+            
+            
+              
+            
+            
+              <p class="caption" role="heading"><span class="caption-text">Getting Started</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../install/index.html">Installing TVM</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../contribute/index.html">Contributor Guide</a></li>
+</ul>
+<p class="caption" role="heading"><span class="caption-text">User Guide</span></p>
+<ul class="current">
+<li class="toctree-l1"><a class="reference internal" href="../../tutorial/index.html">User Tutorial</a></li>
+<li class="toctree-l1 current"><a class="reference internal" href="../index.html">How To Guides</a><ul class="current">
+<li class="toctree-l2"><a class="reference internal" href="../compile_models/index.html">Compile Deep Learning Models</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../deploy/index.html">Deploy Models and Integrate TVM</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../work_with_relay/index.html">Work With Relay</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../work_with_schedules/index.html">Work With Tensor Expression and Schedules</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../optimize_operators/index.html">Optimize Tensor Operators</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tune_with_autotvm/index.html">Auto-Tune with Templates and AutoTVM</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../tune_with_autoscheduler/index.html">Use AutoScheduler for Template-Free Scheduling</a></li>
+<li class="toctree-l2 current"><a class="reference internal" href="index.html">Work With microTVM</a><ul class="current">
+<li class="toctree-l3"><a class="reference internal" href="micro_autotune.html">Autotuning with microTVM</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_ethosu.html">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_reference_vm.html">microTVM Reference Virtual Machines</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3 current"><a class="current reference internal" href="#">Training Vision Models for microTVM on Arduino</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="#motivation">Motivation</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#downloading-the-data">Downloading the Data</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#loading-the-data">Loading the Data</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#id1">Loading the Data</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#quantization">Quantization</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#compiling-with-tvm-for-arduino">Compiling With TVM For Arduino</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#testing-our-arduino-project">Testing our Arduino Project</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#writing-our-arduino-script">Writing our Arduino Script</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#uploading-to-our-device">Uploading to Our Device</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#summary">Summary</a></li>
+</ul>
+</li>
+<li class="toctree-l3"><a class="reference internal" href="micro_tvmc.html">Executing a Tiny Model with TVMC Micro</a></li>
+</ul>
+</li>
+<li class="toctree-l2"><a class="reference internal" href="../extend_tvm/index.html">Extend TVM</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../profile/index.html">Profile Models</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../errors.html">Handle TVM Errors</a></li>
+<li class="toctree-l2"><a class="reference internal" href="../../faq.html">Frequently Asked Questions</a></li>
+</ul>
+</li>
+</ul>
+<p class="caption" role="heading"><span class="caption-text">Developer Guide</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../dev/tutorial/index.html">Developer Tutorial</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../dev/how_to/how_to.html">Developer How-To Guide</a></li>
+</ul>
+<p class="caption" role="heading"><span class="caption-text">Architecture  Guide</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../arch/index.html">Design and Architecture</a></li>
+</ul>
+<p class="caption" role="heading"><span class="caption-text">Topic Guides</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../topic/microtvm/index.html">microTVM: TVM on bare-metal</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../topic/vta/index.html">VTA: Versatile Tensor Accelerator</a></li>
+</ul>
+<p class="caption" role="heading"><span class="caption-text">Reference Guide</span></p>
+<ul>
+<li class="toctree-l1"><a class="reference internal" href="../../reference/langref/index.html">Language Reference</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../reference/api/python/index.html">Python API</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../reference/api/links.html">Other APIs</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../reference/publications.html">Publications</a></li>
+<li class="toctree-l1"><a class="reference internal" href="../../genindex.html">Index</a></li>
+</ul>
+
+            
+          
+        </div>
+        
+      </div>
+    </nav>
+
+    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
+      
+      <nav class="wy-nav-top" aria-label="top navigation" data-toggle="wy-nav-top">
+        
+            <div class="togglemenu">
+
+            </div>
+            <div class="nav-content">
+              <!-- tvm -->
+              Table of content
+            </div>
+        
+      </nav>
+
+
+      <div class="wy-nav-content">
+        
+        <div class="rst-content">
+        
+
+          
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<div role="navigation" aria-label="breadcrumbs navigation">
+
+  <ul class="wy-breadcrumbs">
+    
+      <li><a href="../../index.html">Docs</a> <span class="br-arrow">></span></li>
+        
+          <li><a href="../index.html">How To Guides</a> <span class="br-arrow">></span></li>
+        
+          <li><a href="index.html">Work With microTVM</a> <span class="br-arrow">></span></li>
+        
+      <li>Training Vision Models for microTVM on Arduino</li>
+    
+    
+      <li class="wy-breadcrumbs-aside">
+        
+            
+            <a href="../../_sources/how_to/work_with_microtvm/micro_train.rst.txt" rel="nofollow"> <img src="../../_static//img/source.svg" alt="viewsource"/></a>
+          
+        
+      </li>
+    
+  </ul>
+
+  
+  <hr/>
+</div>
+          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
+           <div itemprop="articleBody">
+            
+  <div class="sphx-glr-download-link-note admonition note">
+<p class="admonition-title">Note</p>
+<p>Click <a class="reference internal" href="#sphx-glr-download-how-to-work-with-microtvm-micro-train-py"><span class="std std-ref">here</span></a> to download the full example code</p>
+</div>
+<div class="sphx-glr-example-title section" id="training-vision-models-for-microtvm-on-arduino">
+<span id="microtvm-train-arduino"></span><span id="sphx-glr-how-to-work-with-microtvm-micro-train-py"></span><h1>Training Vision Models for microTVM on Arduino<a class="headerlink" href="#training-vision-models-for-microtvm-on-arduino" title="Permalink to this headline">¶</a></h1>
+<p><strong>Author</strong>: <a class="reference external" href="https://github.com/guberti">Gavin Uberti</a></p>
+<p>This tutorial shows how MobileNetV1 models can be trained
+to fit on embedded devices, and how those models can be
+deployed to Arduino using TVM.</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>This tutorial is best viewed as a Jupyter Notebook. You can download and run it locally
+using the link at the bottom of this page, or open it online for free using Google Colab.
+Click the icon below to open in Google Colab.</p>
+</div>
+<a class="reference external image-reference" href="https://colab.research.google.com/github/guberti/tvm-site/blob/asf-site/docs/_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb"><img alt="https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png" class="align-center" src="https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/images/utilities/colab_button.png" style="width: 300px;" /></a>
+<div class="section" id="motivation">
+<h2>Motivation<a class="headerlink" href="#motivation" title="Permalink to this headline">¶</a></h2>
+<p>When building IOT devices, we often want them to <strong>see and understand</strong> the world around them.
+This can take many forms, but often times a device will want to know if a certain <strong>kind of
+object</strong> is in its field of vision.</p>
+<p>For example, a security camera might look for <strong>people</strong>, so it can decide whether to save a video
+to memory. A traffic light might look for <strong>cars</strong>, so it can judge which lights should change
+first. Or a forest camera might look for a <strong>kind of animal</strong>, so they can estimate how large
+the animal population is.</p>
+<p>To make these devices affordable, we would like them to need only a low-cost processor like the
+<a class="reference external" href="https://www.nordicsemi.com/Products/nRF52840">nRF52840</a> (costing five dollars each on Mouser) or the <a class="reference external" href="https://www.raspberrypi.com/products/rp2040/">RP2040</a> (just $1.45 each!).</p>
+<p>These devices have very little memory (~250 KB RAM), meaning that no conventional edge AI
+vision model (like MobileNet or EfficientNet) will be able to run. In this tutorial, we will
+show how these models can be modified to work around this requirement. Then, we will use TVM
+to compile and deploy it for an Arduino that uses one of these processors.</p>
+<div class="section" id="installing-the-prerequisites">
+<h3>Installing the Prerequisites<a class="headerlink" href="#installing-the-prerequisites" title="Permalink to this headline">¶</a></h3>
+<p>This tutorial will use TensorFlow to train the model - a widely used machine learning library
+created by Google. TensorFlow is a very low-level library, however, so we will the Keras
+interface to talk to TensorFlow. We will also use TensorFlow Lite to perform quantization on
+our model, as TensorFlow by itself does not support this.</p>
+<p>Once we have our generated model, we will use TVM to compile and test it. To avoid having to
+build from source, we’ll install <code class="docutils literal notranslate"><span class="pre">tlcpack</span></code> - a community build of TVM. Lastly, we’ll also
+install <code class="docutils literal notranslate"><span class="pre">imagemagick</span></code> and <code class="docutils literal notranslate"><span class="pre">curl</span></code> to preprocess data:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>%%bash
+pip install -q tensorflow tflite
+pip install -q tlcpack-nightly -f https://tlcpack.ai/wheels
+apt-get -qq install imagemagick curl
+
+<span class="c1"># Install Arduino CLI and library for Nano 33 BLE</span>
+curl -fsSL https://raw.githubusercontent.com/arduino/arduino-cli/master/install.sh <span class="p">|</span> sh
+/content/bin/arduino-cli core update-index
+/content/bin/arduino-cli core install arduino:mbed_nano
+</pre></div>
+</div>
+</div></blockquote>
+</div>
+<div class="section" id="using-the-gpu">
+<h3>Using the GPU<a class="headerlink" href="#using-the-gpu" title="Permalink to this headline">¶</a></h3>
+<p>This tutorial demonstrates training a neural network, which is requires a lot of computing power
+and will go much faster if you have a GPU. If you are viewing this tutorial on Google Colab, you
+can enable a GPU by going to <strong>Runtime-&gt;Change runtime type</strong> and selecting “GPU” as our hardware
+accelerator. If you are running locally, you can <a class="reference external" href="https://www.tensorflow.org/guide/gpu">follow TensorFlow’s guide</a> instead.</p>
+<p>We can test our GPU installation with the following code:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="nn">tf</span>
+
+<span class="k">if</span> <span class="ow">not</span> <span class="n">tf</span><span class="o">.</span><span class="n">test</span><span class="o">.</span><span class="n">gpu_device_name</span><span class="p">():</span>
+    <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;No GPU was detected!&quot;</span><span class="p">)</span>
+    <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Model training will take much longer (~30 minutes instead of ~5)&quot;</span><span class="p">)</span>
+<span class="k">else</span><span class="p">:</span>
+    <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;GPU detected - you&#39;re good to go.&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>No GPU was detected!
+Model training will take much longer (~30 minutes instead of ~5)
+</pre></div>
+</div>
+</div>
+<div class="section" id="choosing-our-work-dir">
+<h3>Choosing Our Work Dir<a class="headerlink" href="#choosing-our-work-dir" title="Permalink to this headline">¶</a></h3>
+<p>We need to pick a directory where our image datasets, trained model, and eventual Arduino sketch
+will all live. If running on Google Colab, we’ll save everything in <code class="docutils literal notranslate"><span class="pre">/root</span></code> (aka <code class="docutils literal notranslate"><span class="pre">~</span></code>) but you’ll
+probably want to store it elsewhere if running locally. Note that this variable only affects Python
+scripts - you’ll have to adjust the Bash commands too.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
+
+<span class="n">FOLDER</span> <span class="o">=</span> <span class="s2">&quot;/root&quot;</span>
+<span class="c1"># sphinx_gallery_start_ignore</span>
+<span class="kn">import</span> <span class="nn">tempfile</span>
+
+<span class="n">FOLDER</span> <span class="o">=</span> <span class="n">tempfile</span><span class="o">.</span><span class="n">mkdtemp</span><span class="p">()</span>
+<span class="c1"># sphinx_gallery_end_ignore</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="downloading-the-data">
+<h2>Downloading the Data<a class="headerlink" href="#downloading-the-data" title="Permalink to this headline">¶</a></h2>
+<p>Convolutional neural networks usually learn by looking at many images, along with labels telling
+the network what those images are. To get these images, we’ll need a publicly available dataset
+with thousands of images of all sorts of objects and labels of what’s in each image. We’ll also
+need a bunch of images that <strong>aren’t</strong> of cars, as we’re trying to distinguish these two classes.</p>
+<p>In this tutorial, we’ll create a model to detect if an image contains a <strong>car</strong>, but you can use
+whatever category you like! Just change the source URL below to one containing images of another
+type of object.</p>
+<p>To get our car images, we’ll be downloading the <a class="reference external" href="http://ai.stanford.edu/~jkrause/cars/car_dataset.html">Stanford Cars dataset</a>,
+which contains 16,185 full color images of cars. We’ll also need images of random things that
+aren’t cars, so we’ll use the <a class="reference external" href="https://cocodataset.org/#home">COCO 2017</a> validation set (it’s
+smaller, and thus faster to download than the full training set. Training on the full data set
+would yield better results). Note that there are some cars in the COCO 2017 data set, but it’s
+a small enough fraction not to matter - just keep in mind that this will drive down our percieved
+accuracy slightly.</p>
+<p>We could use the TensorFlow dataloader utilities, but we’ll instead do it manually to make sure
+it’s easy to change the datasets being used. We’ll end up with the following file hierarchy:</p>
+<blockquote>
+<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span>/root
+├── images
+│   ├── object
+│   │   ├── 000001.jpg
+│   │   │ ...
+│   │   └── 016185.jpg
+│   ├── object.tgz
+│   ├── random
+│   │   ├── 000000000139.jpg
+│   │   │ ...
+│   │   └── 000000581781.jpg
+│   └── random.zip
+</pre></div>
+</div>
+</div></blockquote>
+<p>We should also note that Stanford cars has 8k images, while the COCO 2017 validation set is 5k
+images - it is not a 50/50 split! If we wanted to, we could weight these classes differently
+during training to correct for this, but training will still work if we ignore it. It should
+take about <strong>2 minutes</strong> to download the Stanford Cars, while COCO 2017 validation will take
+<strong>1 minute</strong>.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">os</span>
+<span class="kn">import</span> <span class="nn">shutil</span>
+<span class="kn">import</span> <span class="nn">urllib.request</span>
+
+<span class="c1"># Download datasets</span>
+<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images&quot;</span><span class="p">)</span>
+<span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span>
+    <span class="s2">&quot;http://ai.stanford.edu/~jkrause/car196/cars_train.tgz&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/target.tgz&quot;</span>
+<span class="p">)</span>
+<span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span>
+    <span class="s2">&quot;http://images.cocodataset.org/zips/val2017.zip&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/random.zip&quot;</span>
+<span class="p">)</span>
+
+<span class="c1"># Extract them and rename their folders</span>
+<span class="n">shutil</span><span class="o">.</span><span class="n">unpack_archive</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/target.tgz&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images&quot;</span> [...]
+<span class="n">shutil</span><span class="o">.</span><span class="n">unpack_archive</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/random.zip&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images&quot;</span> [...]
+<span class="n">shutil</span><span class="o">.</span><span class="n">move</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/cars_train&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/target&quot;</span><sp [...]
+<span class="n">shutil</span><span class="o">.</span><span class="n">move</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/val2017&quot;</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/random&quot;</span><span  [...]
+</pre></div>
+</div>
+</div>
+<div class="section" id="loading-the-data">
+<h2>Loading the Data<a class="headerlink" href="#loading-the-data" title="Permalink to this headline">¶</a></h2>
+<p>Currently, our data is stored on-disk as JPG files of various sizes. To train with it, we’ll have
+to load the images into memory, resize them to be 64x64, and convert them to raw, uncompressed
+data. Keras’s <code class="docutils literal notranslate"><span class="pre">image_dataset_from_directory</span></code> will take care of most of this, though it loads
+images such that each pixel value is a float from 0 to 255.</p>
+<p>We’ll also need to load labels, though Keras will help with this. From our subdirectory structure,
+it knows the images in <code class="docutils literal notranslate"><span class="pre">/objects</span></code> are one class, and those in <code class="docutils literal notranslate"><span class="pre">/random</span></code> another. Setting
+<code class="docutils literal notranslate"><span class="pre">label_mode='categorical'</span></code> tells Keras to convert these into <strong>categorical labels</strong> - a 2x1 vector
+that’s either <code class="docutils literal notranslate"><span class="pre">[1,</span> <span class="pre">0]</span></code> for an object of our target class, or <code class="docutils literal notranslate"><span class="pre">[0,</span> <span class="pre">1]</span></code> vector for anything else.
+We’ll also set <code class="docutils literal notranslate"><span class="pre">shuffle=True</span></code> to randomize the order of our examples.</p>
+<p>We will also <strong>batch</strong> the data - grouping samples into clumps to make our training go faster.
+Setting <code class="docutils literal notranslate"><span class="pre">batch_size</span> <span class="pre">=</span> <span class="pre">32</span></code> is a decent number.</p>
+<p>Lastly, in machine learning we generally want our inputs to be small numbers. We’ll thus use a
+<code class="docutils literal notranslate"><span class="pre">Rescaling</span></code> layer to change our images such that each pixel is a float between <code class="docutils literal notranslate"><span class="pre">0.0</span></code> and <code class="docutils literal notranslate"><span class="pre">1.0</span></code>,
+instead of <code class="docutils literal notranslate"><span class="pre">0</span></code> to <code class="docutils literal notranslate"><span class="pre">255</span></code>. We need to be careful not to rescale our categorical labels though, so
+we’ll use a <code class="docutils literal notranslate"><span class="pre">lambda</span></code> function.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">IMAGE_SIZE</span> <span class="o">=</span> <span class="p">(</span><span class="mi">64</span><span class="p">,</span> <span class="mi">64</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
+<span class="n">unscaled_dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">utils</span><span class="o">.</span><span class="n">image_dataset_from_directory</span><span class="p">(</span>
+    <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images&quot;</span><span class="p">,</span>
+    <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
+    <span class="n">shuffle</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
+    <span class="n">label_mode</span><span class="o">=</span><span class="s2">&quot;categorical&quot;</span><span class="p">,</span>
+    <span class="n">image_size</span><span class="o">=</span><span class="n">IMAGE_SIZE</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">2</span><span class="p">],</span>
+<span class="p">)</span>
+<span class="n">rescale</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Rescaling</span><span class="p">(</span><span class="n">scale</span><span class="o">=</span><span class="mf">1.0</span> <span class="o">/</span> <span class="mi">255</span><span class="p">)</span>
+<span class="n">full_dataset</span> <span class="o">=</span> <span class="n">unscaled_dataset</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">im</span><span class="p">,</span> <span class="n">lbl</span><span class="p">:</span> <span class="p">(</span><span class="n">rescale</span><span class="p">(</span><span class="n">im</span><span class="p">),</span> <span class="n">lbl</span><span class="p">))</span>
+</pre></div>
+</div>
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Found 13144 files belonging to 2 classes.
+</pre></div>
+</div>
+<div class="section" id="what-s-inside-our-dataset">
+<h3>What’s Inside Our Dataset?<a class="headerlink" href="#what-s-inside-our-dataset" title="Permalink to this headline">¶</a></h3>
+<p>Before giving this data set to our neural network, we ought to give it a quick visual inspection.
+Does the data look properly transformed? Do the labels seem appropriate? And what’s our ratio of
+objects to other stuff? We can display some examples from our datasets using <code class="docutils literal notranslate"><span class="pre">matplotlib</span></code>:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
+
+<span class="n">num_target_class</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/target/&quot;</span><span class="p">))</span>
+<span class="n">num_random_class</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">listdir</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/random/&quot;</span><span class="p">))</span>
+<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/target contains </span><span class="si">{</span><span class="n">num_target_class</span><span class="si">}</span><span class="s2"> images&quot;</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/images/random contains </span><span class="si">{</span><span class="n">num_random_class</span><span class="si">}</span><span class="s2"> images&quot;</span><span class="p">)</span>
+
+<span class="c1"># Show some samples and their labels</span>
+<span class="n">SAMPLES_TO_SHOW</span> <span class="o">=</span> <span class="mi">10</span>
+<span class="n">plt</span><span class="o">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">20</span><span class="p">,</span> <span class="mi">10</span><span class="p">))</span>
+<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">label</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">unscaled_dataset</span><span class="o">.</span><span class="n">unbatch</span><span class="p">()):</span>
+    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;=</span> <span class="n">SAMPLES_TO_SHOW</span><span class="p">:</span>
+        <span class="k">break</span>
+    <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">SAMPLES_TO_SHOW</span><span class="p">,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
+    <span class="n">plt</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">image</span><span class="o">.</span><span class="n">numpy</span><span class="p">()</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="s2">&quot;uint8&quot;</span><span class="p">))</span>
+    <span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">label</span><span class="o">.</span><span class="n">numpy</span><span class="p">()))</span>
+    <span class="n">plt</span><span class="o">.</span><span class="n">axis</span><span class="p">(</span><span class="s2">&quot;off&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+<img alt="../../_images/sphx_glr_micro_train_001.png" class="sphx-glr-single-img" src="../../_images/sphx_glr_micro_train_001.png" />
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/tmp/tmpb2g6f7s3/images/target contains 8144 images
+/tmp/tmpb2g6f7s3/images/random contains 5000 images
+</pre></div>
+</div>
+</div>
+<div class="section" id="validating-our-accuracy">
+<h3>Validating our Accuracy<a class="headerlink" href="#validating-our-accuracy" title="Permalink to this headline">¶</a></h3>
+<p>While developing our model, we’ll often want to check how accurate it is (e.g. to see if it
+improves during training). How do we do this? We could just train it on <em>all</em> of the data, and
+then ask it to classify that same data. However, our model could cheat by just memorizing all of
+the samples, which would make it <em>appear</em> to have very high accuracy, but perform very badly in
+reality. In practice, this “memorizing” is called <strong>overfitting</strong>.</p>
+<p>To prevent this, we will set aside some of the data (we’ll use 20%) as a <strong>validation set</strong>. Our
+model will never be trained on validation data - we’ll only use it to check our model’s accuracy.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">num_batches</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">full_dataset</span><span class="p">)</span>
+<span class="n">train_dataset</span> <span class="o">=</span> <span class="n">full_dataset</span><span class="o">.</span><span class="n">take</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">num_batches</span> <span class="o">*</span> <span class="mf">0.8</span><span class="p">))</span>
+<span class="n">validation_dataset</span> <span class="o">=</span> <span class="n">full_dataset</span><span class="o">.</span><span class="n">skip</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="id1">
+<h2>Loading the Data<a class="headerlink" href="#id1" title="Permalink to this headline">¶</a></h2>
+<p>In the past decade, <a class="reference external" href="https://en.wikipedia.org/wiki/Convolutional_neural_network">convolutional neural networks</a> have been widely
+adopted for image classification tasks. State-of-the-art models like <a class="reference external" href="https://arxiv.org/abs/2104.00298">EfficientNet V2</a> are able
+to perform image classification better than even humans! Unfortunately, these models have tens of
+millions of parameters, and thus won’t fit on cheap security camera computers.</p>
+<p>Our applications generally don’t need perfect accuracy - 90% is good enough. We can thus use the
+older and smaller MobileNet V1 architecture. But this <em>still</em> won’t be small enough - by default,
+MobileNet V1 with 224x224 inputs and alpha 1.0 takes ~50 MB to just <strong>store</strong>. To reduce the size
+of the model, there are three knobs we can turn. First, we can reduce the size of the input images
+from 224x224 to 96x96 or 64x64, and Keras makes it easy to do this. We can also reduce the <strong>alpha</strong>
+of the model, from 1.0 to 0.25, which downscales the width of the network (and the number of
+filters) by a factor of four. And if we were really strapped for space, we could reduce the
+number of <strong>channels</strong> by making our model take grayscale images instead of RGB ones.</p>
+<p>In this tutorial, we will use an RGB 64x64 input image and alpha 0.25. This is not quite
+ideal, but it allows the finished model to fit in 192 KB of RAM, while still letting us perform
+transfer learning using the official TensorFlow source models (if we used alpha &lt;0.25 or a
+grayscale input, we wouldn’t be able to do this).</p>
+<div class="section" id="what-is-transfer-learning">
+<h3>What is Transfer Learning?<a class="headerlink" href="#what-is-transfer-learning" title="Permalink to this headline">¶</a></h3>
+<p>Deep learning has <a class="reference external" href="https://paperswithcode.com/sota/image-classification-on-imagenet">dominated image classification</a> for a long time,
+but training neural networks takes a lot of time. When a neural network is trained “from scratch”,
+its parameters start out randomly initialized, forcing it to learn very slowly how to tell images
+apart.</p>
+<p>With transfer learning, we instead start with a neural network that’s <strong>already</strong> good at a
+specific task. In this example, that task is classifying images from <a class="reference external" href="https://www.image-net.org/">the ImageNet database</a>. This
+means the network already has some object detection capabilities, and is likely closer to what you
+want then a random model would be.</p>
+<p>This works especially well with image processing neural networks like MobileNet. In practice, it
+turns out the convolutional layers of the model (i.e. the first 90% of the layers) are used for
+identifying low-level features like lines and shapes - only the last few fully connected layers
+are used to determine how those shapes make up the objects the network is trying to detect.</p>
+<p>We can take advantage of this by starting training with a MobileNet model that was trained on
+ImageNet, and already knows how to identify those lines and shapes. We can then just remove the
+last few layers from this pretrained model, and add our own final layers. We’ll then train this
+conglomerate model for a few epochs on our cars vs non-cars dataset, to adjust the first layers
+and train from scratch the last layers. This process of training an already-partially-trained
+model is called <em>fine-tuning</em>.</p>
+<p>Source MobileNets for transfer learning have been <a class="reference external" href="https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md">pretrained by the TensorFlow folks</a>, so we
+can just download the one closest to what we want (the 128x128 input model with 0.25 depth scale).</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models&quot;</span><span class="p">)</span>
+<span class="n">WEIGHTS_PATH</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/mobilenet_2_5_128_tf.h5&quot;</span>
+<span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlretrieve</span><span class="p">(</span>
+    <span class="s2">&quot;https://storage.googleapis.com/tensorflow/keras-applications/mobilenet/mobilenet_2_5_128_tf.h5&quot;</span><span class="p">,</span>
+    <span class="n">WEIGHTS_PATH</span><span class="p">,</span>
+<span class="p">)</span>
+
+<span class="n">pretrained</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">applications</span><span class="o">.</span><span class="n">MobileNet</span><span class="p">(</span>
+    <span class="n">input_shape</span><span class="o">=</span><span class="n">IMAGE_SIZE</span><span class="p">,</span> <span class="n">weights</span><span class="o">=</span><span class="n">WEIGHTS_PATH</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.25</span>
+<span class="p">)</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="modifying-our-network">
+<h3>Modifying Our Network<a class="headerlink" href="#modifying-our-network" title="Permalink to this headline">¶</a></h3>
+<p>As mentioned above, our pretrained model is designed to classify the 1,000 ImageNet categories,
+but we want to convert it to classify cars. Since only the bottom few layers are task-specific,
+we’ll <strong>cut off the last five layers</strong> of our original model. In their place we’ll build our own
+“tail” to the model by performing respape, dropout, flatten, and softmax operations.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">model</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">models</span><span class="o">.</span><span class="n">Sequential</span><span class="p">()</span>
+
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">InputLayer</span><span class="p">(</span><span class="n">input_shape</span><span class="o">=</span><span class="n">IMAGE_SIZE</span><span class="p">))</span>
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="n">pretrained</span><span class="o">.</span><span class="n">inputs</span><span class="p">,</span> <span class="n">outputs</span><span class="o">=</span><span class="n">pre [...]
+
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Reshape</span><span class="p">((</span><span class="o">-</span><span class="mi">1</span><span class="p">,)))</span>
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Flatten</span><span class="p">())</span>
+<span class="n">model</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;softmax&quot;</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="fine-tuning-our-network">
+<h3>Fine Tuning Our Network<a class="headerlink" href="#fine-tuning-our-network" title="Permalink to this headline">¶</a></h3>
+<p>When training neural networks, we must set a parameter called the <strong>learning rate</strong> that controls
+how fast our network learns. It must be set carefully - too slow, and our network will take
+forever to train; too fast, and our network won’t be able to learn some fine details. Generally
+for Adam (the optimizer we’re using), <code class="docutils literal notranslate"><span class="pre">0.001</span></code> is a pretty good learning rate (and is what’s
+recommended in the <a class="reference external" href="https://arxiv.org/abs/1412.6980">original paper</a>). However, in this case
+<code class="docutils literal notranslate"><span class="pre">0.0005</span></code> seems to work a little better.</p>
+<p>We’ll also pass the validation set from earlier to <code class="docutils literal notranslate"><span class="pre">model.fit</span></code>. This will evaluate how good our
+model is each time we train it, and let us track how our model is improving. Once training is
+finished, the model should have a validation accuracy around <code class="docutils literal notranslate"><span class="pre">0.98</span></code> (meaning it was right 98% of
+the time on our validation set).</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">model</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span>
+    <span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">optimizers</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.0005</span><span class="p">),</span>
+    <span class="n">loss</span><span class="o">=</span><span class="s2">&quot;categorical_crossentropy&quot;</span><span class="p">,</span>
+    <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;accuracy&quot;</span><span class="p">],</span>
+<span class="p">)</span>
+<span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train_dataset</span><span class="p">,</span> <span class="n">validation_data</span><span class="o">=</span><span class="n">validation_dataset</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
+</pre></div>
+</div>
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Epoch 1/3
+328/328 - 54s - loss: 0.2303 - accuracy: 0.9204 - val_loss: 0.1384 - val_accuracy: 0.9577
+Epoch 2/3
+328/328 - 52s - loss: 0.0972 - accuracy: 0.9643 - val_loss: 0.1196 - val_accuracy: 0.9641
+Epoch 3/3
+328/328 - 52s - loss: 0.0668 - accuracy: 0.9751 - val_loss: 0.1581 - val_accuracy: 0.9520
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="quantization">
+<h2>Quantization<a class="headerlink" href="#quantization" title="Permalink to this headline">¶</a></h2>
+<p>We’ve done a decent job of reducing our model’s size so far - changing the input dimension,
+along with removing the bottom layers reduced the model to just 219k parameters. However, each of
+these parameters is a <code class="docutils literal notranslate"><span class="pre">float32</span></code> that takes four bytes, so our model will take up almost one MB!</p>
+<p>Additionally, it might be the case that our hardware doesn’t have built-in support for floating
+point numbers. While most high-memory Arduinos (like the Nano 33 BLE) do have hardware support,
+some others (like the Arduino Due) do not. On any boards <em>without</em> dedicated hardware support,
+floating point multiplication will be extremely slow.</p>
+<p>To address both issues we will <strong>quantize</strong> the model - representing the weights as eight bit
+integers. It’s more complex than just rounding, though - to get the best performance, TensorFlow
+tracks how each neuron in our model activates, so we can figure out how most accurately simulate
+the neuron’s original activations with integer operations.</p>
+<p>We will help TensorFlow do this by creating a representative dataset - a subset of the original
+that is used for tracking how those neurons activate. We’ll then pass this into a <code class="docutils literal notranslate"><span class="pre">TFLiteConverter</span></code>
+(Keras itself does not have quantization support) with an <code class="docutils literal notranslate"><span class="pre">Optimize</span></code> flag to tell TFLite to perform
+the conversion. By default, TFLite keeps the inputs and outputs of our model as floats, so we must
+explicitly tell it to avoid this behavior.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">representative_dataset</span><span class="p">():</span>
+    <span class="k">for</span> <span class="n">image_batch</span><span class="p">,</span> <span class="n">label_batch</span> <span class="ow">in</span> <span class="n">full_dataset</span><span class="o">.</span><span class="n">take</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
+        <span class="k">yield</span> <span class="p">[</span><span class="n">image_batch</span><span class="p">]</span>
+
+
+<span class="n">converter</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">TFLiteConverter</span><span class="o">.</span><span class="n">from_keras_model</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
+<span class="n">converter</span><span class="o">.</span><span class="n">optimizations</span> <span class="o">=</span> <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">Optimize</span><span class="o">.</span><span class="n">DEFAULT</span><span class="p">]</span>
+<span class="n">converter</span><span class="o">.</span><span class="n">representative_dataset</span> <span class="o">=</span> <span class="n">representative_dataset</span>
+<span class="n">converter</span><span class="o">.</span><span class="n">target_spec</span><span class="o">.</span><span class="n">supported_ops</span> <span class="o">=</span> <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">lite</span><span class="o">.</span><span class="n">OpsSet</span><span class="o">.</span><span class="n">TFLITE_BUILTINS_INT8</span><span class="p">]</span>
+<span class="n">converter</span><span class="o">.</span><span class="n">inference_input_type</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">uint8</span>
+<span class="n">converter</span><span class="o">.</span><span class="n">inference_output_type</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">uint8</span>
+
+<span class="n">quantized_model</span> <span class="o">=</span> <span class="n">converter</span><span class="o">.</span><span class="n">convert</span><span class="p">()</span>
+</pre></div>
+</div>
+<div class="section" id="download-the-model-if-desired">
+<h3>Download the Model if Desired<a class="headerlink" href="#download-the-model-if-desired" title="Permalink to this headline">¶</a></h3>
+<p>We’ve now got a finished model that you can use locally or in other tutorials (try autotuning
+this model or viewing it on <a class="reference external" href="https://netron.app/">https://netron.app/</a>). But before we do
+those things, we’ll have to write it to a file (<code class="docutils literal notranslate"><span class="pre">quantized.tflite</span></code>). If you’re running this
+tutorial on Google Colab, you’ll have to uncomment the last two lines to download the file
+after writing it.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">QUANTIZED_MODEL_PATH</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/quantized.tflite&quot;</span>
+<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">QUANTIZED_MODEL_PATH</span><span class="p">,</span> <span class="s2">&quot;wb&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
+    <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">quantized_model</span><span class="p">)</span>
+<span class="c1"># from google.colab import files</span>
+<span class="c1"># files.download(QUANTIZED_MODEL_PATH)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="compiling-with-tvm-for-arduino">
+<h2>Compiling With TVM For Arduino<a class="headerlink" href="#compiling-with-tvm-for-arduino" title="Permalink to this headline">¶</a></h2>
+<p>TensorFlow has a built-in framework for deploying to microcontrollers - <a class="reference external" href="https://www.tensorflow.org/lite/microcontrollers">TFLite Micro</a>. However,
+it’s poorly supported by development boards and does not support autotuning. We will use Apache
+TVM instead.</p>
+<p>TVM can be used either with its command line interface (<code class="docutils literal notranslate"><span class="pre">tvmc</span></code>) or with its Python interface. The
+Python interface is fully-featured and more stable, so we’ll use it here.</p>
+<p>TVM is an optimizing compiler, and optimizations to our model are performed in stages via
+<strong>intermediate representations</strong>. The first of these is <a class="reference external" href="https://arxiv.org/abs/1810.00952">Relay</a> a high-level intermediate
+representation emphasizing portability. The conversion from <code class="docutils literal notranslate"><span class="pre">.tflite</span></code> to Relay is done without any
+knowledge of our “end goal” - the fact we intend to run this model on an Arduino.</p>
+<div class="section" id="choosing-an-arduino-board">
+<h3>Choosing an Arduino Board<a class="headerlink" href="#choosing-an-arduino-board" title="Permalink to this headline">¶</a></h3>
+<p>Next, we’ll have to decide exactly which Arduino board to use. The Arduino sketch that we
+ultimately generate should be compatible with any board, but knowing which board we are using in
+advance allows TVM to adjust its compilation strategy to get better performance.</p>
+<p>There is one catch - we need enough <strong>memory</strong> (flash and RAM) to be able to run our model. We
+won’t ever be able to run a complex vision model like a MobileNet on an Arduino Uno - that board
+only has 2 kB of RAM and 32 kB of flash! Our model has ~200,000 parameters, so there is just no
+way it could fit.</p>
+<p>For this tutorial, we will use the Nano 33 BLE, which has 1 MB of flash memory and 256 KB of RAM.
+However, any other Arduino with those specs or better should also work.</p>
+</div>
+<div class="section" id="generating-our-project">
+<h3>Generating our project<a class="headerlink" href="#generating-our-project" title="Permalink to this headline">¶</a></h3>
+<p>Next, we’ll compile the model to TVM’s MLF (model library format) intermediate representation,
+which consists of C/C++ code and is designed for autotuning. To improve performance, we’ll tell
+TVM that we’re compiling for the <code class="docutils literal notranslate"><span class="pre">nrf52840</span></code> microprocessor (the one the Nano 33 BLE uses). We’ll
+also tell it to use the C runtime (abbreviated <code class="docutils literal notranslate"><span class="pre">crt</span></code>) and to use ahead-of-time memory allocation
+(abbreviated <code class="docutils literal notranslate"><span class="pre">aot</span></code>, which helps reduce the model’s memory footprint). Lastly, we will disable
+vectorization with <code class="docutils literal notranslate"><span class="pre">&quot;tir.disable_vectorize&quot;:</span> <span class="pre">True</span></code>, as C has no native vectorized types.</p>
+<p>Once we have set these configuration parameters, we will call <code class="docutils literal notranslate"><span class="pre">tvm.relay.build</span></code> to compile our
+Relay model into the MLF intermediate representation. From here, we just need to call
+<code class="docutils literal notranslate"><span class="pre">tvm.micro.generate_project</span></code> and pass in the Arduino template project to finish compilation.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">shutil</span>
+<span class="kn">import</span> <span class="nn">tflite</span>
+<span class="kn">import</span> <span class="nn">tvm</span>
+
+<span class="c1"># Method to load model is different in TFLite 1 vs 2</span>
+<span class="k">try</span><span class="p">:</span>  <span class="c1"># TFLite 2.1 and above</span>
+    <span class="n">tflite_model</span> <span class="o">=</span> <span class="n">tflite</span><span class="o">.</span><span class="n">Model</span><span class="o">.</span><span class="n">GetRootAsModel</span><span class="p">(</span><span class="n">quantized_model</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+<span class="k">except</span> <span class="ne">AttributeError</span><span class="p">:</span>  <span class="c1"># Fall back to TFLite 1.14 method</span>
+    <span class="n">tflite_model</span> <span class="o">=</span> <span class="n">tflite</span><span class="o">.</span><span class="n">Model</span><span class="o">.</span><span class="n">Model</span><span class="o">.</span><span class="n">GetRootAsModel</span><span class="p">(</span><span class="n">quantized_model</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+
+<span class="c1"># Convert to the Relay intermediate representation</span>
+<span class="n">mod</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <a href="../../reference/api/python/relay/frontend.html#tvm.relay.frontend.from_tflite" title="View documentation for tvm.relay.frontend.from_tflite"><span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">frontend</span><span class="o">.</span><span class="n">from_tflite</span></a><span class="p">(</span><span class="n">t [...]
+
+<span class="c1"># Set configuration flags to improve performance</span>
+<span class="n">target</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">target</span><span class="o">.</span><span class="n">target</span><span class="o">.</span><span class="n">micro</span><span class="p">(</span><span class="s2">&quot;nrf52840&quot;</span><span class="p">)</span>
+<span class="n">runtime</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">Runtime</span><span class="p">(</span><span class="s2">&quot;crt&quot;</span><span class="p">)</span>
+<span class="n">executor</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">Executor</span><span class="p">(</span><span class="s2">&quot;aot&quot;</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;unpacked-api&quot;</span><span class="p">:</span> <span class="kc">True</span><span class="p">})</span>
+
+<span class="c1"># Convert to the MLF intermediate representation</span>
+<span class="k">with</span> <a href="../../reference/api/python/ir.html#tvm.transform.PassContext" title="View documentation for tvm.transform.PassContext"><span class="n">tvm</span><span class="o">.</span><span class="n">transform</span><span class="o">.</span><span class="n">PassContext</span></a><span class="p">(</span><span class="n">opt_level</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">config</span><span class="o">=</span><span cla [...]
+    <span class="n">mod</span> <span class="o">=</span> <span class="n">tvm</span><span class="o">.</span><span class="n">relay</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">mod</span><span class="p">,</span> <span class="n">target</span><span class="p">,</span> <span class="n">runtime</span><span class="o">=</span><span class="n">runtime</span><span class="p">,</span> <span class="n">executor</span><span class="o">=</span><span class=" [...]
+
+<span class="c1"># Generate an Arduino project from the MLF intermediate representation</span>
+<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/project&quot;</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
+<span class="n">arduino_project</span> <span class="o">=</span> <a href="../../reference/api/python/micro.html#tvm.micro.generate_project" title="View documentation for tvm.micro.generate_project"><span class="n">tvm</span><span class="o">.</span><span class="n">micro</span><span class="o">.</span><span class="n">generate_project</span></a><span class="p">(</span>
+    <a href="../../reference/api/python/micro.html#tvm.micro.get_microtvm_template_projects" title="View documentation for tvm.micro.get_microtvm_template_projects"><span class="n">tvm</span><span class="o">.</span><span class="n">micro</span><span class="o">.</span><span class="n">get_microtvm_template_projects</span></a><span class="p">(</span><span class="s2">&quot;arduino&quot;</span><span class="p">),</span>
+    <span class="n">mod</span><span class="p">,</span>
+    <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/project&quot;</span><span class="p">,</span>
+    <span class="p">{</span>
+        <span class="s2">&quot;arduino_board&quot;</span><span class="p">:</span> <span class="s2">&quot;nano33ble&quot;</span><span class="p">,</span>
+        <span class="s2">&quot;arduino_cli_cmd&quot;</span><span class="p">:</span> <span class="s2">&quot;/content/bin/arduino-cli&quot;</span><span class="p">,</span>
+        <span class="s2">&quot;project_type&quot;</span><span class="p">:</span> <span class="s2">&quot;example_project&quot;</span><span class="p">,</span>
+    <span class="p">},</span>
+<span class="p">)</span>
+</pre></div>
+</div>
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
+  &quot;target_host parameter is going to be deprecated. &quot;
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="testing-our-arduino-project">
+<h2>Testing our Arduino Project<a class="headerlink" href="#testing-our-arduino-project" title="Permalink to this headline">¶</a></h2>
+<p>Consider the following two 224x224 images from the author’s camera roll - one of a car, one not.
+We will test our Arduino project by loading both of these images and executing the compiled model
+on them.</p>
+<a class="reference internal image-reference" href="https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png"><img alt="https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png" class="align-center" src="https://raw.githubusercontent.com/guberti/web-data/micro-train-tutorial-data/testdata/microTVM/data/model_train_images_combined.png" style [...]
+<p>Currently, these are 224x224 PNG images we can download from Imgur. Before we can feed in these
+images, we’ll need to resize and convert them to raw data, which can be done with <code class="docutils literal notranslate"><span class="pre">imagemagick</span></code>.</p>
+<p>It’s also challenging to load raw data onto an Arduino, as only C/CPP files (and similar) are
+compiled. We can work around this by embedding our raw data in a hard-coded C array with the
+built-in utility <code class="docutils literal notranslate"><span class="pre">bin2c</span></code> that will output a file like below:</p>
+<blockquote>
+<div><div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">CAR_IMAGE</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
+<span class="w">  </span><span class="mh">0x22</span><span class="p">,</span><span class="mh">0x23</span><span class="p">,</span><span class="mh">0x14</span><span class="p">,</span><span class="mh">0x22</span><span class="p">,</span><span class="w"></span>
+<span class="w">  </span><span class="p">...</span><span class="w"></span>
+<span class="w">  </span><span class="mh">0x07</span><span class="p">,</span><span class="mh">0x0e</span><span class="p">,</span><span class="mh">0x08</span><span class="p">,</span><span class="mh">0x08</span><span class="w"></span>
+<span class="p">};</span><span class="w"></span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>We can do both of these things with a few lines of Bash code:</p>
+<blockquote>
+<div><div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>%%bash
+mkdir -p ~/tests
+curl <span class="s2">&quot;https://i.imgur.com/JBbEhxN.png&quot;</span> -o ~/tests/car_224.png
+convert ~/tests/car_224.png -resize <span class="m">64</span> ~/tests/car_64.png
+stream ~/tests/car_64.png ~/tests/car.raw
+bin2c -c -st ~/tests/car.raw --name CAR_IMAGE &gt; ~/models/project/car.c
+
+curl <span class="s2">&quot;https://i.imgur.com/wkh7Dx2.png&quot;</span> -o ~/tests/catan_224.png
+convert ~/tests/catan_224.png -resize <span class="m">64</span> ~/tests/catan_64.png
+stream ~/tests/catan_64.png ~/tests/catan.raw
+bin2c -c -st ~/tests/catan.raw --name CATAN_IMAGE &gt; ~/models/project/catan.c
+</pre></div>
+</div>
+</div></blockquote>
+</div>
+<div class="section" id="writing-our-arduino-script">
+<h2>Writing our Arduino Script<a class="headerlink" href="#writing-our-arduino-script" title="Permalink to this headline">¶</a></h2>
+<p>We now need a little bit of Arduino code to read the two binary arrays we just generated, run the
+model on them, and log the output to the serial monitor. This file will replace <code class="docutils literal notranslate"><span class="pre">arduino_sketch.ino</span></code>
+as the main file of our sketch. You’ll have to copy this code in manually..</p>
+<blockquote>
+<div><div class="highlight-c notranslate"><div class="highlight"><pre><span></span><span class="o">%%</span><span class="n">writefile</span><span class="w"> </span><span class="o">/</span><span class="n">root</span><span class="o">/</span><span class="n">models</span><span class="o">/</span><span class="n">project</span><span class="p">.</span><span class="n">ino</span><span class="w"></span>
+<span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;src/model.h&quot;</span><span class="cp"></span>
+<span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;car.c&quot;</span><span class="cp"></span>
+<span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;catan.c&quot;</span><span class="cp"></span>
+
+<span class="kt">void</span><span class="w"> </span><span class="n">setup</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">begin</span><span class="p">(</span><span class="mi">9600</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">TVMInitialize</span><span class="p">();</span><span class="w"></span>
+<span class="p">}</span><span class="w"></span>
+
+<span class="kt">void</span><span class="w"> </span><span class="n">loop</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
+<span class="w">  </span><span class="kt">uint8_t</span><span class="w"> </span><span class="n">result_data</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">println</span><span class="p">(</span><span class="s">&quot;Car results:&quot;</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">TVMExecute</span><span class="p">(</span><span class="n">const_cast</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">*&gt;</span><span class="p">(</span><span class="n">CAR_IMAGE</span><span class="p">),</span><span class="w"> </span><span class="n">result_data</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">result_data</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"> </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;, &quot;</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">result_data</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"> </span><span class="n">Serial</span><span class="p">.</span><span class="n">println</span><span class="p">();</span><span class="w"></span>
+
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">println</span><span class="p">(</span><span class="s">&quot;Other object results:&quot;</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">TVMExecute</span><span class="p">(</span><span class="n">const_cast</span><span class="o">&lt;</span><span class="kt">uint8_t</span><span class="o">*&gt;</span><span class="p">(</span><span class="n">CATAN_IMAGE</span><span class="p">),</span><span class="w"> </span><span class="n">result_data</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">result_data</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span><span class="w"> </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;, &quot;</span><span class="p">);</span><span class="w"></span>
+<span class="w">  </span><span class="n">Serial</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">result_data</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span><span class="w"> </span><span class="n">Serial</span><span class="p">.</span><span class="n">println</span><span class="p">();</span><span class="w"></span>
+
+<span class="w">  </span><span class="n">delay</span><span class="p">(</span><span class="mi">1000</span><span class="p">);</span><span class="w"></span>
+<span class="p">}</span><span class="w"></span>
+</pre></div>
+</div>
+</div></blockquote>
+<div class="section" id="compiling-our-code">
+<h3>Compiling Our Code<a class="headerlink" href="#compiling-our-code" title="Permalink to this headline">¶</a></h3>
+<p>Now that our project has been generated, TVM’s job is mostly done! We can still call
+<code class="docutils literal notranslate"><span class="pre">arduino_project.build()</span></code> and <code class="docutils literal notranslate"><span class="pre">arduino_project.upload()</span></code>, but these just use <code class="docutils literal notranslate"><span class="pre">arduino-cli</span></code>’s
+compile and flash commands underneath. We could also begin autotuning our model, but that’s a
+subject for a different tutorial. To finish up, we’ll verify no compiler errors are thrown
+by our project:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/project/build&quot;</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><spa [...]
+<span class="c1"># sphinx_gallery_start_ignore</span>
+<span class="kn">from</span> <span class="nn">unittest.mock</span> <span class="kn">import</span> <span class="n">MagicMock</span>
+
+<span class="n">arduino_project</span> <span class="o">=</span> <span class="n">MagicMock</span><span class="p">()</span>
+<span class="c1"># sphinx_gallery_end_ignore</span>
+<span class="n">arduino_project</span><span class="o">.</span><span class="n">build</span><span class="p">()</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Compilation succeeded!&quot;</span><span class="p">)</span>
+</pre></div>
+</div>
+<p class="sphx-glr-script-out">Out:</p>
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Compilation succeeded!
+</pre></div>
+</div>
+</div>
+</div>
+<div class="section" id="uploading-to-our-device">
+<h2>Uploading to Our Device<a class="headerlink" href="#uploading-to-our-device" title="Permalink to this headline">¶</a></h2>
+<p>The very last step is uploading our sketch to an Arduino to make sure our code works properly.
+Unfortunately, we can’t do that from Google Colab, so we’ll have to download our sketch. This is
+simple enough to do - we’ll just turn our project into a <cite>.zip</cite> archive, and call <cite>files.download</cite>.
+If you’re running on Google Colab, you’ll have to uncomment the last two lines to download the file
+after writing it.</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">ZIP_FOLDER</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/project&quot;</span>
+<span class="n">shutil</span><span class="o">.</span><span class="n">make_archive</span><span class="p">(</span><span class="n">ZIP_FOLDER</span><span class="p">,</span> <span class="s2">&quot;zip&quot;</span><span class="p">,</span> <span class="n">ZIP_FOLDER</span><span class="p">)</span>
+<span class="c1"># from google.colab import files</span>
+<span class="c1"># files.download(f&quot;{FOLDER}/models/project.zip&quot;)</span>
+<span class="c1"># sphinx_gallery_start_ignore</span>
+<span class="c1"># Run a few unit tests to make sure the Python code worked</span>
+
+<span class="c1"># Ensure transfer learn model was correctly assembled</span>
+<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">layers</span><span class="p">)</span> <span class="o">==</span> <span class="mi">5</span>
+<span class="k">assert</span> <span class="n">model</span><span class="o">.</span><span class="n">count_params</span><span class="p">()</span> <span class="o">==</span> <span class="mi">219058</span>  <span class="c1"># Only 219,058 of these are trainable</span>
+
+<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantized_model</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">250000</span>  <span class="c1"># Quantized model will be 250 KB - 350 KB</span>
+<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">quantized_model</span><span class="p">)</span> <span class="o">&lt;=</span> <span class="mi">350000</span>  <span class="c1"># Exact value depends on quantization</span>
+
+<span class="c1"># Assert .tflite and .zip files were written to disk</span>
+<span class="k">assert</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/quantized.tflite&quot;</span><span class="p">)</span>
+<span class="k">assert</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isfile</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">FOLDER</span><span class="si">}</span><span class="s2">/models/project.zip&quot;</span><span class="p">)</span>
+
+<span class="c1"># Assert MLF file was correctly generated</span>
+<span class="k">assert</span> <span class="nb">str</span><span class="p">(</span><span class="n">mod</span><span class="o">.</span><span class="n">executor</span><span class="p">)</span> <span class="o">==</span> <span class="s2">&quot;aot&quot;</span>
+
+<span class="c1"># Remove the temporary folder we generated at the beginning</span>
+<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">FOLDER</span><span class="p">)</span>
+<span class="c1"># sphinx_gallery_end_ignore</span>
+</pre></div>
+</div>
+<p>From here, we’ll need to open it in the Arduino IDE. You’ll have to download the IDE as well as
+the SDK for whichever board you are using. For certain boards like the Sony SPRESENSE, you may
+have to change settings to control how much memory you want the board to use.</p>
+<div class="section" id="expected-results">
+<h3>Expected Results<a class="headerlink" href="#expected-results" title="Permalink to this headline">¶</a></h3>
+<p>If all works as expected, you should see the following output on a Serial monitor:</p>
+<blockquote>
+<div><div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">Car</span> <span class="n">results</span><span class="p">:</span>
+<span class="mi">255</span><span class="p">,</span> <span class="mi">0</span>
+<span class="n">Other</span> <span class="nb">object</span> <span class="n">results</span><span class="p">:</span>
+<span class="mi">0</span><span class="p">,</span> <span class="mi">255</span>
+</pre></div>
+</div>
+</div></blockquote>
+<p>The first number represents the model’s confidence that the object <strong>is</strong> a car and ranges from
+0-255. The second number represents the model’s confidence that the object <strong>is not</strong> a car and
+is also 0-255. These results mean the model is very sure that the first image is a car, and the
+second image is not (which is correct). Hence, our model is working!</p>
+</div>
+</div>
+<div class="section" id="summary">
+<h2>Summary<a class="headerlink" href="#summary" title="Permalink to this headline">¶</a></h2>
+<p>In this tutorial, we used transfer learning to quickly train an image recognition model to
+identify cars. We modified its input dimensions and last few layers to make it better at this,
+and to make it faster and smaller. We then quantified the model and compiled it using TVM to
+create an Arduino sketch. Lastly, we tested the model using two static images to prove it works
+as intended.</p>
+<div class="section" id="next-steps">
+<h3>Next Steps<a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h3>
+<p>From here, we could modify the model to read live images from the camera - we have another
+Arduino tutorial for how to do that <a class="reference external" href="https://github.com/guberti/tvm-arduino-demos/tree/master/examples/person_detection">on GitHub</a>. Alternatively, we could also
+<a class="reference external" href="https://tvm.apache.org/docs/how_to/work_with_microtvm/micro_autotune.html">use TVM’s autotuning capabilities</a> to dramatically improve the model’s performance.</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 4 minutes  7.242 seconds)</p>
+<div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-how-to-work-with-microtvm-micro-train-py">
+<div class="sphx-glr-download docutils container">
+<p><a class="reference download internal" download="" href="../../_downloads/b52cec46baf4f78d6bcd94cbe269c8a6/micro_train.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">micro_train.py</span></code></a></p>
+</div>
+<div class="sphx-glr-download docutils container">
+<p><a class="reference download internal" download="" href="../../_downloads/a7c7ea4b5017ae70db1f51dd8e6dcd82/micro_train.ipynb"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Jupyter</span> <span class="pre">notebook:</span> <span class="pre">micro_train.ipynb</span></code></a></p>
+</div>
+</div>
+<p class="sphx-glr-signature"><a class="reference external" href="https://sphinx-gallery.github.io">Gallery generated by Sphinx-Gallery</a></p>
+</div>
+</div>
+</div>
+
+
+           </div>
+           
+          </div>
+          
+
+<footer>
+
+    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
+      
+        <a href="micro_tvmc.html" class="btn btn-neutral float-right" title="Executing a Tiny Model with TVMC Micro" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
+      
+      
+        <a href="micro_tflite.html" class="btn btn-neutral float-left" title="microTVM with TFLite Models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
+      
+    </div>
+
+<div id="button" class="backtop"><img src="../../_static//img/right.svg" alt="backtop"/> </div>
+<section class="footerSec">
+    <div class="footerHeader">
+      <ul class="d-flex align-md-items-center justify-content-between flex-column flex-md-row">
+        <li class="copywrite d-flex align-items-center">
+          <h5 id="copy-right-info">© 2020 Apache Software Foundation | All right reserved</h5>
+        </li>
+      </ul>
+
+    </div>
+
+    <ul>
+      <li class="footernote">Copyright © 2020 The Apache Software Foundation. Apache TVM, Apache, the Apache feather, and the Apache TVM project logo are either trademarks or registered trademarks of the Apache Software Foundation.</li>
+    </ul>
+
+</section>
+</footer>
+        </div>
+      </div>
+
+    </section>
+
+  </div>
+  
+
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
+    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
+
+  </body>
+  <script type="text/javascript">
+      jQuery(function () {
+          SphinxRtdTheme.Navigation.enable(true);
+      });
+  </script>
+
+  
+  
+    
+    <!-- Theme Analytics -->
+    <script>
+    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+      (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+      m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+    })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
+
+    ga('create', 'UA-75982049-2', 'auto');
+    ga('send', 'pageview');
+    </script>
+
+    
+   
+
+</body>
+</html>
\ No newline at end of file
diff --git a/docs/how_to/work_with_microtvm/micro_tvmc.html b/docs/how_to/work_with_microtvm/micro_tvmc.html
index 2767c6410..db7387dc4 100644
--- a/docs/how_to/work_with_microtvm/micro_tvmc.html
+++ b/docs/how_to/work_with_microtvm/micro_tvmc.html
@@ -46,7 +46,7 @@
     <link rel="index" title="Index" href="../../genindex.html" />
     <link rel="search" title="Search" href="../../search.html" />
     <link rel="next" title="Extend TVM" href="../extend_tvm/index.html" />
-    <link rel="prev" title="microTVM with TFLite Models" href="micro_tflite.html" /> 
+    <link rel="prev" title="Training Vision Models for microTVM on Arduino" href="micro_train.html" /> 
 </head>
 
 <body class="wy-body-for-nav">
@@ -215,6 +215,7 @@
 <li class="toctree-l3"><a class="reference internal" href="micro_ethosu.html">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_reference_vm.html">microTVM Reference Virtual Machines</a></li>
 <li class="toctree-l3"><a class="reference internal" href="micro_tflite.html">microTVM with TFLite Models</a></li>
+<li class="toctree-l3"><a class="reference internal" href="micro_train.html">Training Vision Models for microTVM on Arduino</a></li>
 <li class="toctree-l3 current"><a class="current reference internal" href="#">Executing a Tiny Model with TVMC Micro</a><ul>
 <li class="toctree-l4"><a class="reference internal" href="#using-tvmc-micro">Using TVMC Micro</a></li>
 <li class="toctree-l4"><a class="reference internal" href="#obtain-a-tiny-model">Obtain a Tiny Model</a></li>
@@ -510,7 +511,7 @@ values using <code class="docutils literal notranslate"><span class="pre">Graph<
         <a href="../extend_tvm/index.html" class="btn btn-neutral float-right" title="Extend TVM" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
       
       
-        <a href="micro_tflite.html" class="btn btn-neutral float-left" title="microTVM with TFLite Models" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
+        <a href="micro_train.html" class="btn btn-neutral float-left" title="Training Vision Models for microTVM on Arduino" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
       
     </div>
 
diff --git a/docs/how_to/work_with_microtvm/sg_execution_times.html b/docs/how_to/work_with_microtvm/sg_execution_times.html
index b5917b7ab..b50883df1 100644
--- a/docs/how_to/work_with_microtvm/sg_execution_times.html
+++ b/docs/how_to/work_with_microtvm/sg_execution_times.html
@@ -300,13 +300,14 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-microtvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:45.476</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
+<p><strong>04:53.618</strong> total execution time for <strong>how_to_work_with_microtvm</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:41.207</strong>: <a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></li>
-<li><p><strong>00:03.698</strong>: <a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></li>
-<li><p><strong>00:00.198</strong>: <a class="reference internal" href="micro_ethosu.html#sphx-glr-how-to-work-with-microtvm-micro-ethosu-py"><span class="std std-ref">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_ethosu.py</span></code>)</p></li>
-<li><p><strong>00:00.193</strong>: <a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tvmc.py</span></code>)</p></li>
-<li><p><strong>00:00.180</strong>: <a class="reference internal" href="micro_reference_vm.html#sphx-glr-how-to-work-with-microtvm-micro-reference-vm-py"><span class="std std-ref">microTVM Reference Virtual Machines</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_reference_vm.py</span></code>)</p></li>
+<li><p><strong>04:07.242</strong>: <a class="reference internal" href="micro_train.html#sphx-glr-how-to-work-with-microtvm-micro-train-py"><span class="std std-ref">Training Vision Models for microTVM on Arduino</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_train.py</span></code>)</p></li>
+<li><p><strong>00:41.830</strong>: <a class="reference internal" href="micro_autotune.html#sphx-glr-how-to-work-with-microtvm-micro-autotune-py"><span class="std std-ref">Autotuning with microTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_autotune.py</span></code>)</p></li>
+<li><p><strong>00:03.931</strong>: <a class="reference internal" href="micro_tflite.html#sphx-glr-how-to-work-with-microtvm-micro-tflite-py"><span class="std std-ref">microTVM with TFLite Models</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tflite.py</span></code>)</p></li>
+<li><p><strong>00:00.208</strong>: <a class="reference internal" href="micro_tvmc.html#sphx-glr-how-to-work-with-microtvm-micro-tvmc-py"><span class="std std-ref">Executing a Tiny Model with TVMC Micro</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_tvmc.py</span></code>)</p></li>
+<li><p><strong>00:00.205</strong>: <a class="reference internal" href="micro_ethosu.html#sphx-glr-how-to-work-with-microtvm-micro-ethosu-py"><span class="std std-ref">Running TVM on bare metal Arm(R) Cortex(R)-M55 CPU and Ethos(TM)-U55 NPU with CMSIS-NN</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_ethosu.py</span></code>)</p></li>
+<li><p><strong>00:00.202</strong>: <a class="reference internal" href="micro_reference_vm.html#sphx-glr-how-to-work-with-microtvm-micro-reference-vm-py"><span class="std std-ref">microTVM Reference Virtual Machines</span></a> (<code class="docutils literal notranslate"><span class="pre">micro_reference_vm.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/work_with_relay/sg_execution_times.html b/docs/how_to/work_with_relay/sg_execution_times.html
index 362426ea9..6ed8c736f 100644
--- a/docs/how_to/work_with_relay/sg_execution_times.html
+++ b/docs/how_to/work_with_relay/sg_execution_times.html
@@ -300,11 +300,11 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-relay-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:06.408</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
+<p><strong>00:11.833</strong> total execution time for <strong>how_to_work_with_relay</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:04.518</strong>: <a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></li>
-<li><p><strong>00:01.686</strong>: <a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></li>
-<li><p><strong>00:00.203</strong>: <a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></li>
+<li><p><strong>00:09.937</strong>: <a class="reference internal" href="using_external_lib.html#sphx-glr-how-to-work-with-relay-using-external-lib-py"><span class="std std-ref">Using External Libraries in Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_external_lib.py</span></code>)</p></li>
+<li><p><strong>00:01.677</strong>: <a class="reference internal" href="build_gcn.html#sphx-glr-how-to-work-with-relay-build-gcn-py"><span class="std std-ref">Building a Graph Convolutional Network</span></a> (<code class="docutils literal notranslate"><span class="pre">build_gcn.py</span></code>)</p></li>
+<li><p><strong>00:00.219</strong>: <a class="reference internal" href="using_relay_viz.html#sphx-glr-how-to-work-with-relay-using-relay-viz-py"><span class="std std-ref">Use Relay Visualizer to Visualize Relay</span></a> (<code class="docutils literal notranslate"><span class="pre">using_relay_viz.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/work_with_schedules/sg_execution_times.html b/docs/how_to/work_with_schedules/sg_execution_times.html
index 4610709c8..be03c34b2 100644
--- a/docs/how_to/work_with_schedules/sg_execution_times.html
+++ b/docs/how_to/work_with_schedules/sg_execution_times.html
@@ -300,16 +300,16 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-how-to-work-with-schedules-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:05.799</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
+<p><strong>00:05.588</strong> total execution time for <strong>how_to_work_with_schedules</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:02.158</strong>: <a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></li>
-<li><p><strong>00:01.248</strong>: <a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></li>
-<li><p><strong>00:00.739</strong>: <a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></li>
-<li><p><strong>00:00.719</strong>: <a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></li>
+<li><p><strong>00:02.069</strong>: <a class="reference internal" href="intrin_math.html#sphx-glr-how-to-work-with-schedules-intrin-math-py"><span class="std std-ref">Intrinsics and Math Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">intrin_math.py</span></code>)</p></li>
+<li><p><strong>00:01.121</strong>: <a class="reference internal" href="tensorize.html#sphx-glr-how-to-work-with-schedules-tensorize-py"><span class="std std-ref">Use Tensorize to Leverage Hardware Intrinsics</span></a> (<code class="docutils literal notranslate"><span class="pre">tensorize.py</span></code>)</p></li>
+<li><p><strong>00:00.734</strong>: <a class="reference internal" href="reduction.html#sphx-glr-how-to-work-with-schedules-reduction-py"><span class="std std-ref">Reduction</span></a> (<code class="docutils literal notranslate"><span class="pre">reduction.py</span></code>)</p></li>
+<li><p><strong>00:00.692</strong>: <a class="reference internal" href="scan.html#sphx-glr-how-to-work-with-schedules-scan-py"><span class="std std-ref">Scan and Recurrent Kernel</span></a> (<code class="docutils literal notranslate"><span class="pre">scan.py</span></code>)</p></li>
 <li><p><strong>00:00.290</strong>: <a class="reference internal" href="extern_op.html#sphx-glr-how-to-work-with-schedules-extern-op-py"><span class="std std-ref">External Tensor Functions</span></a> (<code class="docutils literal notranslate"><span class="pre">extern_op.py</span></code>)</p></li>
-<li><p><strong>00:00.223</strong>: <a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></li>
-<li><p><strong>00:00.219</strong>: <a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></li>
-<li><p><strong>00:00.204</strong>: <a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></li>
+<li><p><strong>00:00.233</strong>: <a class="reference internal" href="schedule_primitives.html#sphx-glr-how-to-work-with-schedules-schedule-primitives-py"><span class="std std-ref">Schedule Primitives in TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">schedule_primitives.py</span></code>)</p></li>
+<li><p><strong>00:00.225</strong>: <a class="reference internal" href="tuple_inputs.html#sphx-glr-how-to-work-with-schedules-tuple-inputs-py"><span class="std std-ref">Compute and Reduce with Tuple Inputs</span></a> (<code class="docutils literal notranslate"><span class="pre">tuple_inputs.py</span></code>)</p></li>
+<li><p><strong>00:00.224</strong>: <a class="reference internal" href="tedd.html#sphx-glr-how-to-work-with-schedules-tedd-py"><span class="std std-ref">Use Tensor Expression Debug Display (TEDD) for Visualization</span></a> (<code class="docutils literal notranslate"><span class="pre">tedd.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/how_to/work_with_schedules/tensorize.html b/docs/how_to/work_with_schedules/tensorize.html
index 99099d0c7..4e8ecb516 100644
--- a/docs/how_to/work_with_schedules/tensorize.html
+++ b/docs/how_to/work_with_schedules/tensorize.html
@@ -552,7 +552,7 @@ The importing needs to happen before the tensorized GEMV being executed.</p>
              C: Buffer(C_2: Pointer(float32), float32, [524288], [])}
   buffer_map = {A_1: A, B_1: B, C_1: C}
   preflattened_buffer_map = {A_1: A_3: Buffer(A_2, float32, [1024, 64], []), B_1: B_3: Buffer(B_2, float32, [512, 64], []), C_1: C_3: Buffer(C_2, float32, [1024, 512], [])} {
-  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmp5oria4hu/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmp5oria4hu/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
+  attr [IterVar(i: int32, (nullptr), &quot;DataPar&quot;, &quot;&quot;)] &quot;pragma_import_llvm&quot; = &quot;; ModuleID = &#39;/tmp/tmp992t4uwf/input0.cc&#39;\nsource_filename = \&quot;/tmp/tmp992t4uwf/input0.cc\&quot;\ntarget datalayout = \&quot;e-m:e-i64:64-f80:128-n8:16:32:64-S128\&quot;\ntarget triple = \&quot;x86_64-pc-linux-gnu\&quot;\n\n; Function Attrs: noinline nounwind optnone uwtable\ndefine dso_local i32 @gemv_update(float*, float*, float*, i32, i32, i32) #0 {\n  %7 = allo [...]
   for (i, 0, 1024) {
     for (j.outer: int32, 0, 32) {
       @tir.call_extern(&quot;gemv_update&quot;, @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), C_2, ((i*512) + (j.outer*16)), 16, 2, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), A_2, (i*64), 64, 1, dtype=handle), @tir.tvm_access_ptr(@tir.type_annotation(, dtype=float32), B_2, (j.outer*1024), 1024, 1, dtype=handle), 16, 64, 64, dtype=int32)
diff --git a/docs/objects.inv b/docs/objects.inv
index 37eeb7e9c..6a81db25d 100644
Binary files a/docs/objects.inv and b/docs/objects.inv differ
diff --git a/docs/reference/api/python/auto_scheduler.html b/docs/reference/api/python/auto_scheduler.html
index 624f9560f..1d7d72886 100644
--- a/docs/reference/api/python/auto_scheduler.html
+++ b/docs/reference/api/python/auto_scheduler.html
@@ -1715,7 +1715,7 @@ Can be the a function or the function name.</p></li>
 
 <dl class="py function">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.auto_schedule">
-<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
+<span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">auto_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">search_policy</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em clas [...]
 <dd><p>THIS API IS DEPRECATED.</p>
 <p>Run auto scheduling search for a task.</p>
 <dl class="field-list simple">
@@ -1752,7 +1752,7 @@ the initial naive schedule (state).</p>
 
 <dl class="py class">
 <dt class="sig sig-object py" id="tvm.auto_scheduler.SketchPolicy">
-<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
+<em class="property"><span class="pre">class</span> </em><span class="sig-prename descclassname"><span class="pre">tvm.auto_scheduler.</span></span><span class="sig-name descname"><span class="pre">SketchPolicy</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">task</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">program_cost_model</span></span><span class="o"><span class="pre">=</span></span><span class="defau [...]
 <dd><p>The search policy that searches in a hierarchical search space defined by sketches.
 The policy randomly samples programs from the space defined by sketches and use evolutionary
 search to fine-tune them.</p>
diff --git a/docs/reference/api/typedoc/classes/bytestreamreader.html b/docs/reference/api/typedoc/classes/bytestreamreader.html
index ab6d9d5b4..f993feb99 100644
--- a/docs/reference/api/typedoc/classes/bytestreamreader.html
+++ b/docs/reference/api/typedoc/classes/bytestreamreader.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -141,7 +141,7 @@
 					<div class="tsd-signature tsd-kind-icon">bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Uint8Array</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L43">rpc_server.ts:43</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -151,7 +151,7 @@
 					<div class="tsd-signature tsd-kind-icon">offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L42">rpc_server.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -168,7 +168,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L63">rpc_server.ts:63</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">Uint8Array</span></h4>
@@ -185,7 +185,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L49">rpc_server.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -202,7 +202,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L57">rpc_server.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/cachedcallstack.html b/docs/reference/api/typedoc/classes/cachedcallstack.html
index 11a211d70..78934f131 100644
--- a/docs/reference/api/typedoc/classes/cachedcallstack.html
+++ b/docs/reference/api/typedoc/classes/cachedcallstack.html
@@ -144,7 +144,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L223">memory.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L223">memory.ts:223</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -172,7 +172,7 @@
 					<div class="tsd-signature tsd-kind-icon">temp<wbr>Args<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><a href="../interfaces/disposable.html" class="tsd-signature-type">Disposable</a><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L208">memory.ts:208</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L208">memory.ts:208</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -194,7 +194,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L312">memory.ts:312</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L312">memory.ts:312</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L284">memory.ts:284</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L284">memory.ts:284</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -262,7 +262,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L388">memory.ts:388</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L388">memory.ts:388</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -300,7 +300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L376">memory.ts:376</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L376">memory.ts:376</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -340,7 +340,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L267">memory.ts:267</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L267">memory.ts:267</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -373,7 +373,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L243">memory.ts:243</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L243">memory.ts:243</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -390,7 +390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L321">memory.ts:321</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L321">memory.ts:321</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -422,7 +422,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L252">memory.ts:252</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L252">memory.ts:252</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -444,7 +444,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L359">memory.ts:359</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L359">memory.ts:359</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -470,7 +470,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L342">memory.ts:342</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L342">memory.ts:342</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -496,7 +496,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L350">memory.ts:350</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L350">memory.ts:350</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -522,7 +522,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L326">memory.ts:326</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L326">memory.ts:326</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -548,7 +548,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L363">memory.ts:363</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L363">memory.ts:363</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -574,7 +574,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L346">memory.ts:346</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L346">memory.ts:346</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -600,7 +600,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L334">memory.ts:334</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L334">memory.ts:334</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
diff --git a/docs/reference/api/typedoc/classes/dldatatype.html b/docs/reference/api/typedoc/classes/dldatatype.html
index 8332f84f3..7c6c64853 100644
--- a/docs/reference/api/typedoc/classes/dldatatype.html
+++ b/docs/reference/api/typedoc/classes/dldatatype.html
@@ -119,7 +119,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">bits<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L260">runtime.ts:260</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L260">runtime.ts:260</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">code<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L258">runtime.ts:258</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L258">runtime.ts:258</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -177,7 +177,7 @@
 					<div class="tsd-signature tsd-kind-icon">lanes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L262">runtime.ts:262</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L262">runtime.ts:262</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -199,7 +199,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L279">runtime.ts:279</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L279">runtime.ts:279</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -216,7 +216,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L270">runtime.ts:270</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L270">runtime.ts:270</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/dldevice.html b/docs/reference/api/typedoc/classes/dldevice.html
index fb234504c..78e9c94c6 100644
--- a/docs/reference/api/typedoc/classes/dldevice.html
+++ b/docs/reference/api/typedoc/classes/dldevice.html
@@ -118,7 +118,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L202">runtime.ts:202</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L202">runtime.ts:202</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L200">runtime.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L200">runtime.ts:200</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -161,7 +161,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L198">runtime.ts:198</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L198">runtime.ts:198</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -183,7 +183,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L223">runtime.ts:223</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L223">runtime.ts:223</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -205,7 +205,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L230">runtime.ts:230</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L230">runtime.ts:230</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">string</span></h4>
diff --git a/docs/reference/api/typedoc/classes/environment.html b/docs/reference/api/typedoc/classes/environment.html
index 611dd7b02..3e2d709ef 100644
--- a/docs/reference/api/typedoc/classes/environment.html
+++ b/docs/reference/api/typedoc/classes/environment.html
@@ -125,7 +125,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L86">environment.ts:86</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L86">environment.ts:86</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -169,7 +169,7 @@
 					<aside class="tsd-sources">
 						<p>Implementation of <a href="../interfaces/libraryprovider.html">LibraryProvider</a>.<a href="../interfaces/libraryprovider.html#imports">imports</a></p>
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L70">environment.ts:70</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L70">environment.ts:70</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L69">environment.ts:69</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L69">environment.ts:69</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -210,7 +210,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">ctypes.FTVMWasmPackedCFunc</span><span class="tsd-signature-symbol"> | </span><span class="tsd-signature-type">undefined</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = [undefined,]</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L78">environment.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L78">environment.ts:78</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -228,7 +228,7 @@
 					<div class="tsd-signature tsd-kind-icon">packedCFunc<wbr>Table<wbr>Free<wbr>Id<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span><span class="tsd-signature-symbol"> = []</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L84">environment.ts:84</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L84">environment.ts:84</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -250,7 +250,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L105">environment.ts:105</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L105">environment.ts:105</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ffilibrary.html b/docs/reference/api/typedoc/classes/ffilibrary.html
index 8e70abc43..ee7e3d296 100644
--- a/docs/reference/api/typedoc/classes/ffilibrary.html
+++ b/docs/reference/api/typedoc/classes/ffilibrary.html
@@ -131,7 +131,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L49">runtime.ts:49</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L49">runtime.ts:49</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L46">runtime.ts:46</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L46">runtime.ts:46</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L45">runtime.ts:45</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L45">runtime.ts:45</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L44">runtime.ts:44</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L44">runtime.ts:44</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">webGPUContext<span class="tsd-signature-symbol">:</span> <a href="webgpucontext.html" class="tsd-signature-type">WebGPUContext</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L47">runtime.ts:47</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L47">runtime.ts:47</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -203,7 +203,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L76">runtime.ts:76</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L76">runtime.ts:76</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -226,7 +226,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L66">runtime.ts:66</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L66">runtime.ts:66</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -243,7 +243,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L84">runtime.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L84">runtime.ts:84</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <a href="cachedcallstack.html" class="tsd-signature-type">CachedCallStack</a></h4>
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L95">runtime.ts:95</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L95">runtime.ts:95</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -283,7 +283,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L72">runtime.ts:72</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L72">runtime.ts:72</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
diff --git a/docs/reference/api/typedoc/classes/graphexecutor.html b/docs/reference/api/typedoc/classes/graphexecutor.html
index 2e8e2e61a..ec1beddb3 100644
--- a/docs/reference/api/typedoc/classes/graphexecutor.html
+++ b/docs/reference/api/typedoc/classes/graphexecutor.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L583">runtime.ts:583</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L583">runtime.ts:583</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">module<span class="tsd-signature-symbol">:</span> <a href="module.html" class="tsd-signature-type">Module</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L579">runtime.ts:579</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L579">runtime.ts:579</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L654">runtime.ts:654</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L654">runtime.ts:654</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -224,7 +224,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L597">runtime.ts:597</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L597">runtime.ts:597</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -241,7 +241,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L631">runtime.ts:631</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L631">runtime.ts:631</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L644">runtime.ts:644</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L644">runtime.ts:644</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -310,7 +310,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L621">runtime.ts:621</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L621">runtime.ts:621</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -332,7 +332,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L609">runtime.ts:609</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L609">runtime.ts:609</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/instance.html b/docs/reference/api/typedoc/classes/instance.html
index 0e0de5c77..2c6346cf6 100644
--- a/docs/reference/api/typedoc/classes/instance.html
+++ b/docs/reference/api/typedoc/classes/instance.html
@@ -139,7 +139,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L692">runtime.ts:692</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L692">runtime.ts:692</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -202,7 +202,7 @@
 					<div class="tsd-signature tsd-kind-icon">exports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">Function</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L684">runtime.ts:684</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L684">runtime.ts:684</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -212,7 +212,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L683">runtime.ts:683</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L683">runtime.ts:683</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -229,7 +229,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L932">runtime.ts:932</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L932">runtime.ts:932</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -260,7 +260,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L994">runtime.ts:994</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L994">runtime.ts:994</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -303,7 +303,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L924">runtime.ts:924</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L924">runtime.ts:924</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -341,7 +341,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L732">runtime.ts:732</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L732">runtime.ts:732</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -358,7 +358,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L952">runtime.ts:952</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L952">runtime.ts:952</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -402,7 +402,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L816">runtime.ts:816</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L816">runtime.ts:816</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -434,7 +434,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L1033">runtime.ts:1033</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -465,7 +465,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L846">runtime.ts:846</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L846">runtime.ts:846</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -497,7 +497,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L750">runtime.ts:750</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L750">runtime.ts:750</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -520,7 +520,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L1013">runtime.ts:1013</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -568,7 +568,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L789">runtime.ts:789</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L789">runtime.ts:789</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -608,7 +608,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L914">runtime.ts:914</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L914">runtime.ts:914</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -646,7 +646,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L1134">runtime.ts:1134</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L1134">runtime.ts:1134</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -698,7 +698,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L740">runtime.ts:740</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L740">runtime.ts:740</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -722,7 +722,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L868">runtime.ts:868</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L868">runtime.ts:868</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -754,7 +754,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L857">runtime.ts:857</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L857">runtime.ts:857</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -786,7 +786,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L940">runtime.ts:940</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L940">runtime.ts:940</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/memory.html b/docs/reference/api/typedoc/classes/memory.html
index d26ab652d..68a50138c 100644
--- a/docs/reference/api/typedoc/classes/memory.html
+++ b/docs/reference/api/typedoc/classes/memory.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L40">memory.ts:40</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L40">memory.ts:40</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Memory</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L32">memory.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L32">memory.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -162,7 +162,7 @@
 					<div class="tsd-signature tsd-kind-icon">wasm32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">boolean</span><span class="tsd-signature-symbol"> = true</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L33">memory.ts:33</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L33">memory.ts:33</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -179,7 +179,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L154">memory.ts:154</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L154">memory.ts:154</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -210,7 +210,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L90">memory.ts:90</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L90">memory.ts:90</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -233,7 +233,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L97">memory.ts:97</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L97">memory.ts:97</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -256,7 +256,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L74">memory.ts:74</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L74">memory.ts:74</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -279,7 +279,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L81">memory.ts:81</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L81">memory.ts:81</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -302,7 +302,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L104">memory.ts:104</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L104">memory.ts:104</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -325,7 +325,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L132">memory.ts:132</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L132">memory.ts:132</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -362,7 +362,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L145">memory.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L145">memory.ts:145</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -393,7 +393,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L60">memory.ts:60</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L60">memory.ts:60</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -416,7 +416,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L67">memory.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L67">memory.ts:67</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -439,7 +439,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L53">memory.ts:53</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L53">memory.ts:53</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -462,7 +462,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L114">memory.ts:114</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L114">memory.ts:114</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -485,7 +485,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L124">memory.ts:124</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L124">memory.ts:124</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">number</span></h4>
@@ -502,7 +502,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/memory.ts#L175">memory.ts:175</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/memory.ts#L175">memory.ts:175</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/module.html b/docs/reference/api/typedoc/classes/module.html
index 1aee16384..bdcf63194 100644
--- a/docs/reference/api/typedoc/classes/module.html
+++ b/docs/reference/api/typedoc/classes/module.html
@@ -124,7 +124,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L504">runtime.ts:504</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L504">runtime.ts:504</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L502">runtime.ts:502</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L502">runtime.ts:502</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -187,7 +187,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L516">runtime.ts:516</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L516">runtime.ts:516</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -204,7 +204,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L530">runtime.ts:530</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L530">runtime.ts:530</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -236,7 +236,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L561">runtime.ts:561</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L561">runtime.ts:561</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/ndarray.html b/docs/reference/api/typedoc/classes/ndarray.html
index e4a5308c7..bb68ef614 100644
--- a/docs/reference/api/typedoc/classes/ndarray.html
+++ b/docs/reference/api/typedoc/classes/ndarray.html
@@ -130,7 +130,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L304">runtime.ts:304</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L304">runtime.ts:304</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -158,7 +158,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <a href="dldevice.html" class="tsd-signature-type">DLDevice</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L297">runtime.ts:297</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L297">runtime.ts:297</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -173,7 +173,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L293">runtime.ts:293</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L293">runtime.ts:293</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -188,7 +188,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L289">runtime.ts:289</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L289">runtime.ts:289</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -203,7 +203,7 @@
 					<div class="tsd-signature tsd-kind-icon">ndim<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L291">runtime.ts:291</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L291">runtime.ts:291</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -218,7 +218,7 @@
 					<div class="tsd-signature tsd-kind-icon">shape<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L295">runtime.ts:295</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L295">runtime.ts:295</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -240,7 +240,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L370">runtime.ts:370</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L370">runtime.ts:370</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -273,7 +273,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L414">runtime.ts:414</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L414">runtime.ts:414</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -305,7 +305,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L355">runtime.ts:355</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L355">runtime.ts:355</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
@@ -322,7 +322,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L474">runtime.ts:474</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L474">runtime.ts:474</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -346,7 +346,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L443">runtime.ts:443</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L443">runtime.ts:443</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/packedfunccell.html b/docs/reference/api/typedoc/classes/packedfunccell.html
index 618fbc4de..d785d7728 100644
--- a/docs/reference/api/typedoc/classes/packedfunccell.html
+++ b/docs/reference/api/typedoc/classes/packedfunccell.html
@@ -122,7 +122,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L158">runtime.ts:158</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L158">runtime.ts:158</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -147,7 +147,7 @@
 					<div class="tsd-signature tsd-kind-icon">handle<span class="tsd-signature-symbol">:</span> <a href="../index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L157">runtime.ts:157</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L157">runtime.ts:157</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -164,7 +164,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L165">runtime.ts:165</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L165">runtime.ts:165</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-returns-title">Returns <span class="tsd-signature-type">void</span></h4>
diff --git a/docs/reference/api/typedoc/classes/rpcserver.html b/docs/reference/api/typedoc/classes/rpcserver.html
index 21937417b..9c142afc8 100644
--- a/docs/reference/api/typedoc/classes/rpcserver.html
+++ b/docs/reference/api/typedoc/classes/rpcserver.html
@@ -115,7 +115,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L92">rpc_server.ts:92</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">get<wbr>Imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">unknown</span><span class="tsd-signat [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L82">rpc_server.ts:82</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -201,7 +201,7 @@
 					<div class="tsd-signature tsd-kind-icon">key<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L78">rpc_server.ts:78</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -211,7 +211,7 @@
 					<div class="tsd-signature tsd-kind-icon">logger<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>msg<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L81">rpc_server.ts:81</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-type-declaration">
@@ -242,7 +242,7 @@
 					<div class="tsd-signature tsd-kind-icon">socket<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">WebSocket</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L79">rpc_server.ts:79</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -252,7 +252,7 @@
 					<div class="tsd-signature tsd-kind-icon">state<span class="tsd-signature-symbol">:</span> <a href="../enums/rpcserverstate.html" class="tsd-signature-type">RPCServerState</a><span class="tsd-signature-symbol"> = RPCServerState.InitHeader</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L80">rpc_server.ts:80</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -262,7 +262,7 @@
 					<div class="tsd-signature tsd-kind-icon">url<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L77">rpc_server.ts:77</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/classes/scalar.html b/docs/reference/api/typedoc/classes/scalar.html
index 8a78a855c..830b2c079 100644
--- a/docs/reference/api/typedoc/classes/scalar.html
+++ b/docs/reference/api/typedoc/classes/scalar.html
@@ -112,7 +112,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -137,7 +137,7 @@
 					<div class="tsd-signature tsd-kind-icon">dtype<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L145">runtime.ts:145</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L145">runtime.ts:145</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -152,7 +152,7 @@
 					<div class="tsd-signature tsd-kind-icon">value<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L143">runtime.ts:143</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L143">runtime.ts:143</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/classes/webgpucontext.html b/docs/reference/api/typedoc/classes/webgpucontext.html
index 969c69614..29f0e2a57 100644
--- a/docs/reference/api/typedoc/classes/webgpucontext.html
+++ b/docs/reference/api/typedoc/classes/webgpucontext.html
@@ -120,7 +120,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L57">webgpu.ts:57</a></li>
 								</ul>
 							</aside>
 							<h4 class="tsd-parameters-title">Parameters</h4>
@@ -145,7 +145,7 @@
 					<div class="tsd-signature tsd-kind-icon">device<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">GPUDevice</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L50">webgpu.ts:50</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -155,7 +155,7 @@
 					<div class="tsd-signature tsd-kind-icon">memory<span class="tsd-signature-symbol">:</span> <a href="memory.html" class="tsd-signature-type">Memory</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L51">webgpu.ts:51</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -172,7 +172,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L84">webgpu.ts:84</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -209,7 +209,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L170">webgpu.ts:170</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L67">webgpu.ts:67</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/enums/argtypecode.html b/docs/reference/api/typedoc/enums/argtypecode.html
index 0a0df2f34..b308a1ab9 100644
--- a/docs/reference/api/typedoc/enums/argtypecode.html
+++ b/docs/reference/api/typedoc/enums/argtypecode.html
@@ -106,7 +106,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 6</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L220">ctypes.ts:220</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -116,7 +116,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L216">ctypes.ts:216</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -126,7 +126,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L214">ctypes.ts:214</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -136,7 +136,7 @@
 					<div class="tsd-signature tsd-kind-icon">Null<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L218">ctypes.ts:218</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -146,7 +146,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 12</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L226">ctypes.ts:226</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -156,7 +156,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMDLTensor<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 7</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L221">ctypes.ts:221</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -166,7 +166,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L219">ctypes.ts:219</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -176,7 +176,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMModule<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 9</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L223">ctypes.ts:223</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -186,7 +186,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMNDArray<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 13</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L227">ctypes.ts:227</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -196,7 +196,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObject<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L222">ctypes.ts:222</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -206,7 +206,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMObjectRValue<wbr>Ref<wbr>Arg<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 14</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L228">ctypes.ts:228</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -216,7 +216,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMOpaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L217">ctypes.ts:217</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -226,7 +226,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMPacked<wbr>Func<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 10</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L224">ctypes.ts:224</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -236,7 +236,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 11</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L225">ctypes.ts:225</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -246,7 +246,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L215">ctypes.ts:215</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/aynccallbackcode.html b/docs/reference/api/typedoc/enums/aynccallbackcode.html
index 8a93e6fe1..4cf915689 100644
--- a/docs/reference/api/typedoc/enums/aynccallbackcode.html
+++ b/docs/reference/api/typedoc/enums/aynccallbackcode.html
@@ -93,7 +93,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Exception<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 5</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L676">runtime.ts:676</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L676">runtime.ts:676</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -103,7 +103,7 @@
 					<div class="tsd-signature tsd-kind-icon">k<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L675">runtime.ts:675</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L675">runtime.ts:675</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/dldatatypecode.html b/docs/reference/api/typedoc/enums/dldatatypecode.html
index a633f0d76..ae6fe786f 100644
--- a/docs/reference/api/typedoc/enums/dldatatypecode.html
+++ b/docs/reference/api/typedoc/enums/dldatatypecode.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">Float<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L242">runtime.ts:242</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L242">runtime.ts:242</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">Int<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 0</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L240">runtime.ts:240</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L240">runtime.ts:240</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">Opaque<wbr>Handle<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 3</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L243">runtime.ts:243</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L243">runtime.ts:243</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -125,7 +125,7 @@
 					<div class="tsd-signature tsd-kind-icon">UInt<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L241">runtime.ts:241</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L241">runtime.ts:241</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/rpcserverstate.html b/docs/reference/api/typedoc/enums/rpcserverstate.html
index dd1fdc1a0..0660e0c96 100644
--- a/docs/reference/api/typedoc/enums/rpcserverstate.html
+++ b/docs/reference/api/typedoc/enums/rpcserverstate.html
@@ -90,7 +90,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L27">rpc_server.ts:27</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Header<wbr>Key<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L28">rpc_server.ts:28</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">Init<wbr>Server<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L29">rpc_server.ts:29</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Body<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L32">rpc_server.ts:32</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">Receive<wbr>Packet<wbr>Header<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L31">rpc_server.ts:31</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">Wait<wbr>For<wbr>Callback<span class="tsd-signature-symbol">:</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L30">rpc_server.ts:30</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/enums/sizeof.html b/docs/reference/api/typedoc/enums/sizeof.html
index ba1a1eb9c..211847e6d 100644
--- a/docs/reference/api/typedoc/enums/sizeof.html
+++ b/docs/reference/api/typedoc/enums/sizeof.html
@@ -100,7 +100,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L206">ctypes.ts:206</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -110,7 +110,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLDevice<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = I32 + I32</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L207">ctypes.ts:207</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -120,7 +120,7 @@
 					<div class="tsd-signature tsd-kind-icon">F32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L203">ctypes.ts:203</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -130,7 +130,7 @@
 					<div class="tsd-signature tsd-kind-icon">F64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L204">ctypes.ts:204</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -140,7 +140,7 @@
 					<div class="tsd-signature tsd-kind-icon">I32<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 4</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L201">ctypes.ts:201</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -150,7 +150,7 @@
 					<div class="tsd-signature tsd-kind-icon">I64<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L202">ctypes.ts:202</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -160,7 +160,7 @@
 					<div class="tsd-signature tsd-kind-icon">TVMValue<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 8</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L205">ctypes.ts:205</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -170,7 +170,7 @@
 					<div class="tsd-signature tsd-kind-icon">U16<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 2</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L200">ctypes.ts:200</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -180,7 +180,7 @@
 					<div class="tsd-signature tsd-kind-icon">U8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol"> = 1</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L199">ctypes.ts:199</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/index.html b/docs/reference/api/typedoc/index.html
index a46106072..e5690cab8 100644
--- a/docs/reference/api/typedoc/index.html
+++ b/docs/reference/api/typedoc/index.html
@@ -174,7 +174,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Alloc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>shape<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, ndim<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeCode<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, dtypeBits<span class="tsd [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L112">ctypes.ts:112</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -238,7 +238,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>Bytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">num [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L128">ctypes.ts:128</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -282,7 +282,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>From<wbr>To<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>from<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, to<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-sig [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L144">ctypes.ts:144</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -326,7 +326,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Copy<wbr>ToBytes<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, data<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nbytes<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</sp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L136">ctypes.ts:136</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -370,7 +370,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMArray<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>handle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L121">ctypes.ts:121</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -406,7 +406,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMBackend<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number< [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L160">ctypes.ts:160</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -458,7 +458,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCFunc<wbr>Set<wbr>Return<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ret<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L77">ctypes.ts:77</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -506,7 +506,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMCb<wbr>Arg<wbr>ToReturn<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>value<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, code<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span c [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L83">ctypes.ts:83</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -545,7 +545,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Call<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, argValues<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCode<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-t [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L67">ctypes.ts:67</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -601,7 +601,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>func<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L57">ctypes.ts:57</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -637,7 +637,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Get<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span cla [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L100">ctypes.ts:100</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -676,7 +676,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>List<wbr>Global<wbr>Names<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>outSize<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, outArray<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L88">ctypes.ts:88</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -715,7 +715,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMFunc<wbr>Register<wbr>Global<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>name<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, f<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, override<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</spa [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L94">ctypes.ts:94</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -758,7 +758,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMGet<wbr>Last<wbr>Error<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L34">ctypes.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -788,7 +788,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Free<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L52">ctypes.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -824,7 +824,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Get<wbr>Function<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, funcName<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, queryImports<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">numbe [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L42">ctypes.ts:42</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -872,7 +872,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMMod<wbr>Import<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>mod<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, dep<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-si [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L48">ctypes.ts:48</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -912,7 +912,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMSynchronize<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>deviceType<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, deviceId<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, stream<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signatur [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L150">ctypes.ts:150</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -954,7 +954,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Alloc<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>size<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L167">ctypes.ts:167</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -990,7 +990,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Free<wbr>Space<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>ptr<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L170">ctypes.ts:170</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1026,7 +1026,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>Func<wbr>Create<wbr>FromCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resource<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, out<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&g [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L187">ctypes.ts:187</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1066,7 +1066,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>args<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, typeCodes<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a>, nargs<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">number</span>, [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L179">ctypes.ts:179</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1118,7 +1118,7 @@
 					<div class="tsd-signature tsd-kind-icon">FTVMWasm<wbr>PackedCFunc<wbr>Finalizer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>resourceHandle<span class="tsd-signature-symbol">: </span><a href="index.html#pointer" class="tsd-signature-type">Pointer</a><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L193">ctypes.ts:193</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1154,7 +1154,7 @@
 					<div class="tsd-signature tsd-kind-icon">GPUPointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L25">webgpu.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1169,7 +1169,7 @@
 					<div class="tsd-signature tsd-kind-icon">Packed<wbr>Func<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">...</span>args<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol"> &amp; </span><a href="interfaces/disp [...]
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L36">runtime.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L36">runtime.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1184,7 +1184,7 @@
 					<div class="tsd-signature tsd-kind-icon">Pointer<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L25">ctypes.ts:25</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1199,7 +1199,7 @@
 					<div class="tsd-signature tsd-kind-icon">Ptr<wbr>Offset<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/ctypes.ts#L28">ctypes.ts:28</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1217,7 +1217,7 @@
 					<div class="tsd-signature tsd-kind-icon">RPC_<wbr>MAGIC<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">1045105</span><span class="tsd-signature-symbol"> = 1045105</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/rpc_server.ts#L36">rpc_server.ts:36</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -1239,7 +1239,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/support.ts#L25">support.ts:25</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/support.ts#L25">support.ts:25</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1271,7 +1271,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/support.ts#L39">support.ts:39</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/support.ts#L39">support.ts:39</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1300,7 +1300,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/support.ts#L52">support.ts:52</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/support.ts#L52">support.ts:52</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1337,7 +1337,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/compact.ts#L38">compact.ts:38</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/compact.ts#L38">compact.ts:38</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1368,7 +1368,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L30">webgpu.ts:30</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1390,7 +1390,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/environment.ts#L32">environment.ts:32</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/environment.ts#L32">environment.ts:32</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1421,7 +1421,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/compact.ts#L24">compact.ts:24</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/compact.ts#L24">compact.ts:24</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1443,7 +1443,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L1356">runtime.ts:1356</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L1356">runtime.ts:1356</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1508,7 +1508,7 @@
 						<li class="tsd-description">
 							<aside class="tsd-sources">
 								<ul>
-									<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/support.ts#L62">support.ts:62</a></li>
+									<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/support.ts#L62">support.ts:62</a></li>
 								</ul>
 							</aside>
 							<div class="tsd-comment tsd-typography">
@@ -1530,7 +1530,7 @@
 					<div class="tsd-signature tsd-kind-icon">DLData<wbr>Type<wbr>Code<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L246">runtime.ts:246</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L246">runtime.ts:246</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1539,7 +1539,7 @@
 						<div class="tsd-signature tsd-kind-icon">0<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;int&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L247">runtime.ts:247</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L247">runtime.ts:247</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1549,7 +1549,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;uint&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L248">runtime.ts:248</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L248">runtime.ts:248</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1559,7 +1559,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;float&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L249">runtime.ts:249</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L249">runtime.ts:249</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1569,7 +1569,7 @@
 						<div class="tsd-signature tsd-kind-icon">3<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;handle&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L250">runtime.ts:250</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L250">runtime.ts:250</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1580,7 +1580,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Enum<wbr>ToStr<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L175">runtime.ts:175</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L175">runtime.ts:175</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1589,7 +1589,7 @@
 						<div class="tsd-signature tsd-kind-icon">1<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L176">runtime.ts:176</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L176">runtime.ts:176</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1599,7 +1599,7 @@
 						<div class="tsd-signature tsd-kind-icon">15<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;webgpu&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L180">runtime.ts:180</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L180">runtime.ts:180</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1609,7 +1609,7 @@
 						<div class="tsd-signature tsd-kind-icon">2<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;cuda&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L177">runtime.ts:177</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L177">runtime.ts:177</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1619,7 +1619,7 @@
 						<div class="tsd-signature tsd-kind-icon">4<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;opencl&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L178">runtime.ts:178</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L178">runtime.ts:178</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1629,7 +1629,7 @@
 						<div class="tsd-signature tsd-kind-icon">8<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span><span class="tsd-signature-symbol"> = &quot;metal&quot;</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L179">runtime.ts:179</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L179">runtime.ts:179</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1640,7 +1640,7 @@
 					<div class="tsd-signature tsd-kind-icon">Device<wbr>Str<wbr>ToEnum<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">object</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L183">runtime.ts:183</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L183">runtime.ts:183</a></li>
 						</ul>
 					</aside>
 					<section class="tsd-panel tsd-member tsd-kind-variable tsd-parent-kind-object-literal">
@@ -1649,7 +1649,7 @@
 						<div class="tsd-signature tsd-kind-icon">cl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L186">runtime.ts:186</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L186">runtime.ts:186</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1659,7 +1659,7 @@
 						<div class="tsd-signature tsd-kind-icon">cpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 1</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L184">runtime.ts:184</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L184">runtime.ts:184</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1669,7 +1669,7 @@
 						<div class="tsd-signature tsd-kind-icon">cuda<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 2</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L185">runtime.ts:185</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L185">runtime.ts:185</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1679,7 +1679,7 @@
 						<div class="tsd-signature tsd-kind-icon">metal<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 8</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L189">runtime.ts:189</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L189">runtime.ts:189</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1689,7 +1689,7 @@
 						<div class="tsd-signature tsd-kind-icon">opencl<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 4</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L187">runtime.ts:187</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L187">runtime.ts:187</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1699,7 +1699,7 @@
 						<div class="tsd-signature tsd-kind-icon">vulkan<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 7</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L188">runtime.ts:188</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L188">runtime.ts:188</a></li>
 							</ul>
 						</aside>
 					</section>
@@ -1709,7 +1709,7 @@
 						<div class="tsd-signature tsd-kind-icon">webgpu<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">number</span><span class="tsd-signature-symbol"> = 15</span></div>
 						<aside class="tsd-sources">
 							<ul>
-								<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/runtime.ts#L190">runtime.ts:190</a></li>
+								<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/runtime.ts#L190">runtime.ts:190</a></li>
 							</ul>
 						</aside>
 					</section>
diff --git a/docs/reference/api/typedoc/interfaces/disposable.html b/docs/reference/api/typedoc/interfaces/disposable.html
index f23b56b6a..fbfe472a7 100644
--- a/docs/reference/api/typedoc/interfaces/disposable.html
+++ b/docs/reference/api/typedoc/interfaces/disposable.html
@@ -113,7 +113,7 @@
 					<div class="tsd-signature tsd-kind-icon">dispose<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/types.ts#L52">types.ts:52</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/types.ts#L52">types.ts:52</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/reference/api/typedoc/interfaces/functioninfo.html b/docs/reference/api/typedoc/interfaces/functioninfo.html
index da574ac5e..1f7759024 100644
--- a/docs/reference/api/typedoc/interfaces/functioninfo.html
+++ b/docs/reference/api/typedoc/interfaces/functioninfo.html
@@ -95,7 +95,7 @@
 					<div class="tsd-signature tsd-kind-icon">arg_<wbr>types<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L41">webgpu.ts:41</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -105,7 +105,7 @@
 					<div class="tsd-signature tsd-kind-icon">launch_<wbr>param_<wbr>tags<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Array</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L42">webgpu.ts:42</a></li>
 						</ul>
 					</aside>
 				</section>
@@ -115,7 +115,7 @@
 					<div class="tsd-signature tsd-kind-icon">name<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">string</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/webgpu.ts#L40">webgpu.ts:40</a></li>
 						</ul>
 					</aside>
 				</section>
diff --git a/docs/reference/api/typedoc/interfaces/libraryprovider.html b/docs/reference/api/typedoc/interfaces/libraryprovider.html
index b2d49dbc6..a8ecb1737 100644
--- a/docs/reference/api/typedoc/interfaces/libraryprovider.html
+++ b/docs/reference/api/typedoc/interfaces/libraryprovider.html
@@ -112,7 +112,7 @@
 					<div class="tsd-signature tsd-kind-icon">imports<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-type">Record</span><span class="tsd-signature-symbol">&lt;</span><span class="tsd-signature-type">string</span><span class="tsd-signature-symbol">, </span><span class="tsd-signature-type">any</span><span class="tsd-signature-symbol">&gt;</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/types.ts#L34">types.ts:34</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/types.ts#L34">types.ts:34</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
@@ -127,7 +127,7 @@
 					<div class="tsd-signature tsd-kind-icon">start<span class="tsd-signature-symbol">:</span> <span class="tsd-signature-symbol">(</span>inst<span class="tsd-signature-symbol">: </span><span class="tsd-signature-type">Instance</span><span class="tsd-signature-symbol">)</span><span class="tsd-signature-symbol"> =&gt; </span><span class="tsd-signature-type">void</span></div>
 					<aside class="tsd-sources">
 						<ul>
-							<li>Defined in <a href="https://github.com/apache/tvm/blob/6dbdf2e20/web/src/types.ts#L39">types.ts:39</a></li>
+							<li>Defined in <a href="https://github.com/apache/tvm/blob/f05ebde8e/web/src/types.ts#L39">types.ts:39</a></li>
 						</ul>
 					</aside>
 					<div class="tsd-comment tsd-typography">
diff --git a/docs/searchindex.js b/docs/searchindex.js
index cfb169e5c..2b95ff82b 100644
--- a/docs/searchindex.js
+++ b/docs/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
+Search.setIndex({docnames:["arch/benchmark","arch/convert_layout","arch/debugger","arch/device_target_interactions","arch/frontend/tensorflow","arch/hybrid_script","arch/index","arch/inferbound","arch/introduction_to_module_serialization","arch/microtvm_design","arch/microtvm_project_api","arch/model_library_format","arch/pass_infra","arch/relay_intro","arch/relay_op_strategy","arch/runtime","arch/runtimes/vulkan","arch/security","arch/virtual_machine","contribute/ci","contribute/code_gu [...]
\ No newline at end of file
diff --git a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
index f7f82415a..ab062e8ca 100644
--- a/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/autotvm/sg_execution_times.html
@@ -300,10 +300,10 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-autotvm-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:20.898</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
+<p><strong>00:21.548</strong> total execution time for <strong>topic_vta_tutorials_autotvm</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:20.710</strong>: <a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></li>
-<li><p><strong>00:00.188</strong>: <a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></li>
+<li><p><strong>00:21.332</strong>: <a class="reference internal" href="tune_relay_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-relay-vta-py"><span class="std std-ref">Auto-tuning a convolutional network on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_relay_vta.py</span></code>)</p></li>
+<li><p><strong>00:00.215</strong>: <a class="reference internal" href="tune_alu_vta.html#sphx-glr-topic-vta-tutorials-autotvm-tune-alu-vta-py"><span class="std std-ref">Auto-tuning a ALU fused op on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">tune_alu_vta.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/topic/vta/tutorials/frontend/deploy_classification.html b/docs/topic/vta/tutorials/frontend/deploy_classification.html
index bb07b6cdf..1c7591bb8 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_classification.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_classification.html
@@ -541,7 +541,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
   DeprecationWarning,
 /workspace/vta/tutorials/frontend/deploy_classification.py:213: DeprecationWarning: legacy graph executor behavior of producing json / lib / params will be removed in the next release. Please see documents of tvm.contrib.graph_executor.GraphModule for the  new recommended usage.
   relay_prog, target=tvm.target.Target(target, host=env.target_host), params=params
-resnet18_v1 inference graph built in 21.37s!
+resnet18_v1 inference graph built in 21.96s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/deploy_detection.html b/docs/topic/vta/tutorials/frontend/deploy_detection.html
index 912d439b3..4a2a69626 100644
--- a/docs/topic/vta/tutorials/frontend/deploy_detection.html
+++ b/docs/topic/vta/tutorials/frontend/deploy_detection.html
@@ -559,7 +559,7 @@ and dense layer which will both be executed in fp32 on the CPU.</p></li>
   &quot;target_host parameter is going to be deprecated. &quot;
 /workspace/python/tvm/relay/build_module.py:389: DeprecationWarning: Please use input parameter mod (tvm.IRModule) instead of deprecated parameter mod (tvm.relay.function.Function)
   DeprecationWarning,
-yolov3-tiny inference graph built in 15.05s!
+yolov3-tiny inference graph built in 15.44s!
 </pre></div>
 </div>
 </div>
diff --git a/docs/topic/vta/tutorials/frontend/sg_execution_times.html b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
index 4cd1bdecd..dde9711af 100644
--- a/docs/topic/vta/tutorials/frontend/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/frontend/sg_execution_times.html
@@ -300,10 +300,10 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-frontend-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>01:29.370</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
+<p><strong>01:30.558</strong> total execution time for <strong>topic_vta_tutorials_frontend</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:47.481</strong>: <a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></li>
-<li><p><strong>00:41.889</strong>: <a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></li>
+<li><p><strong>00:47.890</strong>: <a class="reference internal" href="deploy_detection.html#sphx-glr-topic-vta-tutorials-frontend-deploy-detection-py"><span class="std std-ref">Deploy Pretrained Vision Detection Model from Darknet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_detection.py</span></code>)</p></li>
+<li><p><strong>00:42.668</strong>: <a class="reference internal" href="deploy_classification.html#sphx-glr-topic-vta-tutorials-frontend-deploy-classification-py"><span class="std std-ref">Deploy Pretrained Vision Model from MxNet on VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">deploy_classification.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/topic/vta/tutorials/optimize/sg_execution_times.html b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
index b33620a3e..cf86ef004 100644
--- a/docs/topic/vta/tutorials/optimize/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/optimize/sg_execution_times.html
@@ -300,10 +300,10 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-optimize-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:03.545</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
+<p><strong>00:03.586</strong> total execution time for <strong>topic_vta_tutorials_optimize</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:02.977</strong>: <a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></li>
-<li><p><strong>00:00.568</strong>: <a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></li>
+<li><p><strong>00:03.005</strong>: <a class="reference internal" href="convolution_opt.html#sphx-glr-topic-vta-tutorials-optimize-convolution-opt-py"><span class="std std-ref">2D Convolution Optimization</span></a> (<code class="docutils literal notranslate"><span class="pre">convolution_opt.py</span></code>)</p></li>
+<li><p><strong>00:00.581</strong>: <a class="reference internal" href="matrix_multiply_opt.html#sphx-glr-topic-vta-tutorials-optimize-matrix-multiply-opt-py"><span class="std std-ref">Matrix Multiply Blocking</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply_opt.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/topic/vta/tutorials/sg_execution_times.html b/docs/topic/vta/tutorials/sg_execution_times.html
index 36c8fedae..24e8a52ae 100644
--- a/docs/topic/vta/tutorials/sg_execution_times.html
+++ b/docs/topic/vta/tutorials/sg_execution_times.html
@@ -300,10 +300,10 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-topic-vta-tutorials-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>00:01.041</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
+<p><strong>00:01.058</strong> total execution time for <strong>topic_vta_tutorials</strong> files:</p>
 <ul class="simple">
-<li><p><strong>00:00.526</strong>: <a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></li>
-<li><p><strong>00:00.515</strong>: <a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></li>
+<li><p><strong>00:00.539</strong>: <a class="reference internal" href="matrix_multiply.html#sphx-glr-topic-vta-tutorials-matrix-multiply-py"><span class="std std-ref">Simple Matrix Multiply</span></a> (<code class="docutils literal notranslate"><span class="pre">matrix_multiply.py</span></code>)</p></li>
+<li><p><strong>00:00.519</strong>: <a class="reference internal" href="vta_get_started.html#sphx-glr-topic-vta-tutorials-vta-get-started-py"><span class="std std-ref">Get Started with VTA</span></a> (<code class="docutils literal notranslate"><span class="pre">vta_get_started.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/tutorial/auto_scheduler_matmul_x86.html b/docs/tutorial/auto_scheduler_matmul_x86.html
index 9b6f39ab0..8f8b6654d 100644
--- a/docs/tutorial/auto_scheduler_matmul_x86.html
+++ b/docs/tutorial/auto_scheduler_matmul_x86.html
@@ -545,7 +545,7 @@ operator fusion.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 93.672 ms
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Execution time of this operator: 93.732 ms
 </pre></div>
 </div>
 </div>
@@ -611,7 +611,6 @@ resume the status and do more 5 trials.</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Resume search:
 /usr/local/lib/python3.7/dist-packages/xgboost/training.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboost.readthedocs.io/en/latest/python/callbacks.html
   warnings.warn(f&#39;Old style callback is deprecated.  See: {link}&#39;, UserWarning)
-*E
 </pre></div>
 </div>
 </div>
@@ -622,7 +621,6 @@ automatically optimize a matrix multiplication, without the need to specify a
 search template.  It ends a series of examples that starts from the Tensor
 Expression (TE) language that demonstrates how TVM can optimize computational
 operations.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  7.017 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-auto-scheduler-matmul-x86-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/eac4389b114db015e95cb3cdf8b86b83/auto_scheduler_matmul_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">auto_scheduler_matmul_x86.py</span></code></a></p>
diff --git a/docs/tutorial/autotvm_relay_x86.html b/docs/tutorial/autotvm_relay_x86.html
index b1f827b5d..2ae105fff 100644
--- a/docs/tutorial/autotvm_relay_x86.html
+++ b/docs/tutorial/autotvm_relay_x86.html
@@ -521,7 +521,7 @@ standard deviation.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 494.0058553899962, &#39;median&#39;: 493.83366904999093, &#39;std&#39;: 0.6866634716120491}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>{&#39;mean&#39;: 497.6316300500048, &#39;median&#39;: 497.7335548499923, &#39;std&#39;: 1.7081865449643931}
 </pre></div>
 </div>
 </div>
@@ -675,179 +675,179 @@ depending on the specifics of the model and the target platform.</p>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[Task  1/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  1/25]  Current/Best:   17.55/  17.55 GFLOPS | Progress: (4/20) | 5.92 s
-[Task  1/25]  Current/Best:    6.17/  17.55 GFLOPS | Progress: (8/20) | 8.90 s
-[Task  1/25]  Current/Best:   11.57/  22.76 GFLOPS | Progress: (12/20) | 11.32 s
-[Task  1/25]  Current/Best:   16.87/  22.78 GFLOPS | Progress: (16/20) | 12.99 s
-[Task  1/25]  Current/Best:   11.61/  23.96 GFLOPS | Progress: (20/20) | 14.71 s Done.
+[Task  1/25]  Current/Best:   17.44/  17.44 GFLOPS | Progress: (4/20) | 6.03 s
+[Task  1/25]  Current/Best:    6.17/  17.44 GFLOPS | Progress: (8/20) | 8.99 s
+[Task  1/25]  Current/Best:   11.53/  22.67 GFLOPS | Progress: (12/20) | 11.44 s
+[Task  1/25]  Current/Best:   16.77/  22.67 GFLOPS | Progress: (16/20) | 13.14 s
+[Task  1/25]  Current/Best:   11.59/  23.91 GFLOPS | Progress: (20/20) | 14.88 s Done.
 
 [Task  2/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  2/25]  Current/Best:   12.06/  12.92 GFLOPS | Progress: (4/20) | 3.78 s
-[Task  2/25]  Current/Best:   14.10/  18.85 GFLOPS | Progress: (8/20) | 5.08 s
-[Task  2/25]  Current/Best:   20.70/  20.70 GFLOPS | Progress: (12/20) | 6.41 s
-[Task  2/25]  Current/Best:   12.51/  20.70 GFLOPS | Progress: (16/20) | 7.65 s
-[Task  2/25]  Current/Best:   19.29/  20.70 GFLOPS | Progress: (20/20) | 9.24 s Done.
+[Task  2/25]  Current/Best:   12.30/  13.27 GFLOPS | Progress: (4/20) | 3.72 s
+[Task  2/25]  Current/Best:   13.62/  18.58 GFLOPS | Progress: (8/20) | 5.06 s
+[Task  2/25]  Current/Best:   21.17/  21.17 GFLOPS | Progress: (12/20) | 6.37 s
+[Task  2/25]  Current/Best:   12.92/  21.17 GFLOPS | Progress: (16/20) | 7.68 s
+[Task  2/25]  Current/Best:   19.54/  21.17 GFLOPS | Progress: (20/20) | 9.31 s Done.
 
 [Task  3/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  3/25]  Current/Best:    1.63/  10.60 GFLOPS | Progress: (4/20) | 5.76 s
-[Task  3/25]  Current/Best:   15.61/  16.92 GFLOPS | Progress: (8/20) | 7.68 s
-[Task  3/25]  Current/Best:   14.92/  16.92 GFLOPS | Progress: (12/20) | 9.38 s
-[Task  3/25]  Current/Best:    7.21/  23.86 GFLOPS | Progress: (16/20) | 11.25 s
-[Task  3/25]  Current/Best:   11.36/  23.86 GFLOPS | Progress: (20/20) | 15.80 s Done.
+[Task  3/25]  Current/Best:    1.63/  10.54 GFLOPS | Progress: (4/20) | 5.84 s
+[Task  3/25]  Current/Best:   15.55/  16.85 GFLOPS | Progress: (8/20) | 7.78 s
+[Task  3/25]  Current/Best:   14.88/  16.85 GFLOPS | Progress: (12/20) | 9.50 s
+[Task  3/25]  Current/Best:    7.02/  23.68 GFLOPS | Progress: (16/20) | 11.43 s
+[Task  3/25]  Current/Best:   11.25/  23.68 GFLOPS | Progress: (20/20) | 16.02 s Done.
 
 [Task  4/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  4/25]  Current/Best:    9.57/  20.41 GFLOPS | Progress: (4/20) | 2.31 s
-[Task  4/25]  Current/Best:    6.84/  20.41 GFLOPS | Progress: (8/20) | 6.95 s
-[Task  4/25]  Current/Best:   22.51/  22.51 GFLOPS | Progress: (12/20) | 11.71 s
-[Task  4/25]  Current/Best:   17.37/  22.51 GFLOPS | Progress: (16/20) | 14.07 s
-[Task  4/25]  Current/Best:   13.40/  22.51 GFLOPS | Progress: (20/20) | 16.03 s Done.
+[Task  4/25]  Current/Best:    9.17/  20.53 GFLOPS | Progress: (4/20) | 2.37 s
+[Task  4/25]  Current/Best:    6.78/  20.53 GFLOPS | Progress: (8/20) | 7.11 s
+[Task  4/25]  Current/Best:   21.61/  21.61 GFLOPS | Progress: (12/20) | 11.94 s
+[Task  4/25]  Current/Best:   17.25/  21.61 GFLOPS | Progress: (16/20) | 14.34 s
+[Task  4/25]  Current/Best:   13.20/  21.61 GFLOPS | Progress: (20/20) | 16.41 s Done.
 
 [Task  5/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  5/25]  Current/Best:    9.70/  10.35 GFLOPS | Progress: (4/20) | 2.49 s
-[Task  5/25]  Current/Best:   11.78/  12.30 GFLOPS | Progress: (8/20) | 4.57 s
-[Task  5/25]  Current/Best:   11.83/  18.09 GFLOPS | Progress: (12/20) | 7.73 s
-[Task  5/25]  Current/Best:   11.83/  22.78 GFLOPS | Progress: (16/20) | 9.14 s
-[Task  5/25]  Current/Best:   12.07/  22.78 GFLOPS | Progress: (20/20) | 11.06 s Done.
+[Task  5/25]  Current/Best:    9.58/  10.20 GFLOPS | Progress: (4/20) | 2.56 s
+[Task  5/25]  Current/Best:   11.80/  12.70 GFLOPS | Progress: (8/20) | 4.64 s
+[Task  5/25]  Current/Best:   11.77/  18.04 GFLOPS | Progress: (12/20) | 7.83 s
+[Task  5/25]  Current/Best:   11.70/  22.56 GFLOPS | Progress: (16/20) | 9.25 s
+[Task  5/25]  Current/Best:   12.09/  22.56 GFLOPS | Progress: (20/20) | 11.17 s Done.
 
 [Task  6/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  6/25]  Current/Best:   12.27/  20.78 GFLOPS | Progress: (4/20) | 4.00 s
-[Task  6/25]  Current/Best:   19.02/  20.78 GFLOPS | Progress: (8/20) | 5.75 s
-[Task  6/25]  Current/Best:   13.22/  20.78 GFLOPS | Progress: (12/20) | 7.70 s
-[Task  6/25]  Current/Best:   19.89/  20.78 GFLOPS | Progress: (16/20) | 9.92 s
-[Task  6/25]  Current/Best:    3.73/  20.78 GFLOPS | Progress: (20/20) | 12.46 s Done.
+[Task  6/25]  Current/Best:   12.20/  20.80 GFLOPS | Progress: (4/20) | 4.00 s
+[Task  6/25]  Current/Best:   19.00/  20.80 GFLOPS | Progress: (8/20) | 5.77 s
+[Task  6/25]  Current/Best:   13.28/  20.80 GFLOPS | Progress: (12/20) | 7.73 s
+[Task  6/25]  Current/Best:   19.98/  20.80 GFLOPS | Progress: (16/20) | 9.96 s
+[Task  6/25]  Current/Best:    3.71/  20.80 GFLOPS | Progress: (20/20) | 12.47 s Done.
 
 [Task  7/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  7/25]  Current/Best:   11.21/  12.78 GFLOPS | Progress: (4/20) | 3.52 s
-[Task  7/25]  Current/Best:   20.28/  21.16 GFLOPS | Progress: (8/20) | 5.02 s
-[Task  7/25]  Current/Best:   16.11/  21.16 GFLOPS | Progress: (12/20) | 6.91 s
-[Task  7/25]  Current/Best:   12.27/  21.16 GFLOPS | Progress: (16/20) | 8.94 s
-[Task  7/25]  Current/Best:    6.31/  21.92 GFLOPS | Progress: (20/20) | 11.38 s Done.
+[Task  7/25]  Current/Best:   11.24/  12.92 GFLOPS | Progress: (4/20) | 3.46 s
+[Task  7/25]  Current/Best:   19.99/  21.01 GFLOPS | Progress: (8/20) | 4.97 s
+[Task  7/25]  Current/Best:   15.86/  21.01 GFLOPS | Progress: (12/20) | 6.91 s
+[Task  7/25]  Current/Best:   12.27/  21.01 GFLOPS | Progress: (16/20) | 8.96 s
+[Task  7/25]  Current/Best:    6.43/  21.67 GFLOPS | Progress: (20/20) | 11.43 s Done.
 
 [Task  8/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  8/25]  Current/Best:    9.81/  14.29 GFLOPS | Progress: (4/20) | 2.83 s
-[Task  8/25]  Current/Best:    9.59/  14.29 GFLOPS | Progress: (8/20) | 7.92 s
-[Task  8/25]  Current/Best:   12.69/  14.29 GFLOPS | Progress: (12/20) | 14.29 s
-[Task  8/25]  Current/Best:   18.83/  18.83 GFLOPS | Progress: (16/20) | 16.37 s
-[Task  8/25]  Current/Best:   19.88/  19.88 GFLOPS | Progress: (20/20) | 23.33 s Done.
+[Task  8/25]  Current/Best:    9.97/  13.84 GFLOPS | Progress: (4/20) | 2.85 s
+[Task  8/25]  Current/Best:    9.40/  13.84 GFLOPS | Progress: (8/20) | 7.98 s
+[Task  8/25]  Current/Best:   12.71/  13.84 GFLOPS | Progress: (12/20) | 14.44 s
+[Task  8/25]  Current/Best:   19.01/  19.01 GFLOPS | Progress: (16/20) | 16.57 s
+[Task  8/25]  Current/Best:   19.87/  19.87 GFLOPS | Progress: (20/20) | 23.67 s Done.
 
 [Task  9/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task  9/25]  Current/Best:   14.32/  14.32 GFLOPS | Progress: (4/20) | 11.89 s
-[Task  9/25]  Current/Best:   23.50/  23.50 GFLOPS | Progress: (8/20) | 13.59 s
-[Task  9/25]  Current/Best:    8.30/  23.50 GFLOPS | Progress: (12/20) | 16.11 s
-[Task  9/25]  Current/Best:   17.91/  23.50 GFLOPS | Progress: (16/20) | 18.99 s
-[Task  9/25]  Current/Best:    9.07/  23.50 GFLOPS | Progress: (20/20) | 27.51 s
+[Task  9/25]  Current/Best:   14.32/  15.77 GFLOPS | Progress: (4/20) | 11.88 s
+[Task  9/25]  Current/Best:   19.26/  19.88 GFLOPS | Progress: (8/20) | 13.64 s
+[Task  9/25]  Current/Best:    8.25/  19.88 GFLOPS | Progress: (12/20) | 16.13 s
+[Task  9/25]  Current/Best:   18.05/  19.88 GFLOPS | Progress: (16/20) | 18.96 s
+[Task  9/25]  Current/Best:    9.05/  19.88 GFLOPS | Progress: (20/20) | 27.46 s
 [Task 10/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 10/25]  Current/Best:   18.05/  18.05 GFLOPS | Progress: (4/20) | 2.52 s
-[Task 10/25]  Current/Best:   15.39/  18.05 GFLOPS | Progress: (8/20) | 4.13 s
-[Task 10/25]  Current/Best:   12.58/  18.68 GFLOPS | Progress: (12/20) | 5.66 s
-[Task 10/25]  Current/Best:   19.00/  20.37 GFLOPS | Progress: (16/20) | 6.76 s
-[Task 10/25]  Current/Best:    8.86/  20.37 GFLOPS | Progress: (20/20) | 8.30 s Done.
+[Task 10/25]  Current/Best:   18.12/  18.12 GFLOPS | Progress: (4/20) | 2.49 s
+[Task 10/25]  Current/Best:   15.54/  18.12 GFLOPS | Progress: (8/20) | 4.12 s
+[Task 10/25]  Current/Best:   12.59/  18.74 GFLOPS | Progress: (12/20) | 5.67 s
+[Task 10/25]  Current/Best:   19.19/  20.31 GFLOPS | Progress: (16/20) | 6.77 s
+[Task 10/25]  Current/Best:    8.83/  20.31 GFLOPS | Progress: (20/20) | 8.31 s Done.
 
 [Task 11/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 11/25]  Current/Best:   12.36/  18.17 GFLOPS | Progress: (4/20) | 3.24 s
-[Task 11/25]  Current/Best:   16.72/  18.17 GFLOPS | Progress: (8/20) | 6.03 s
-[Task 11/25]  Current/Best:   17.58/  18.17 GFLOPS | Progress: (12/20) | 8.09 s
-[Task 11/25]  Current/Best:   13.46/  21.23 GFLOPS | Progress: (16/20) | 10.93 s
-[Task 11/25]  Current/Best:   19.38/  21.64 GFLOPS | Progress: (20/20) | 13.00 s Done.
+[Task 11/25]  Current/Best:   12.23/  18.09 GFLOPS | Progress: (4/20) | 3.32 s
+[Task 11/25]  Current/Best:   16.85/  18.09 GFLOPS | Progress: (8/20) | 6.14 s
+[Task 11/25]  Current/Best:   18.11/  18.11 GFLOPS | Progress: (12/20) | 8.18 s
+[Task 11/25]  Current/Best:   13.13/  21.29 GFLOPS | Progress: (16/20) | 11.06 s
+[Task 11/25]  Current/Best:   19.54/  21.60 GFLOPS | Progress: (20/20) | 13.15 s Done.
 
 [Task 12/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 12/25]  Current/Best:    7.86/  18.07 GFLOPS | Progress: (4/20) | 5.63 s
-[Task 12/25]  Current/Best:    5.23/  18.07 GFLOPS | Progress: (8/20) | 9.50 s
-[Task 12/25]  Current/Best:   18.88/  18.88 GFLOPS | Progress: (12/20) | 11.47 s
-[Task 12/25]  Current/Best:   15.53/  18.88 GFLOPS | Progress: (16/20) | 14.38 s
-[Task 12/25]  Current/Best:   15.17/  18.88 GFLOPS | Progress: (20/20) | 16.28 s Done.
+[Task 12/25]  Current/Best:    7.83/  18.08 GFLOPS | Progress: (4/20) | 5.65 s
+[Task 12/25]  Current/Best:    5.20/  18.08 GFLOPS | Progress: (8/20) | 9.57 s
+[Task 12/25]  Current/Best:   18.93/  18.93 GFLOPS | Progress: (12/20) | 11.57 s
+[Task 12/25]  Current/Best:   13.94/  18.93 GFLOPS | Progress: (16/20) | 14.54 s
+[Task 12/25]  Current/Best:   15.04/  18.93 GFLOPS | Progress: (20/20) | 16.48 s Done.
 
 [Task 13/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 13/25]  Current/Best:    8.20/  17.34 GFLOPS | Progress: (4/20) | 3.64 s
-[Task 13/25]  Current/Best:   16.03/  21.01 GFLOPS | Progress: (8/20) | 6.19 s
-[Task 13/25]  Current/Best:   19.72/  21.78 GFLOPS | Progress: (12/20) | 9.23 s
-[Task 13/25]  Current/Best:   12.34/  21.78 GFLOPS | Progress: (16/20) | 12.64 s
-[Task 13/25]  Current/Best:   18.56/  21.78 GFLOPS | Progress: (20/20) | 14.96 s Done.
+[Task 13/25]  Current/Best:    8.74/  17.34 GFLOPS | Progress: (4/20) | 3.73 s
+[Task 13/25]  Current/Best:   16.06/  20.75 GFLOPS | Progress: (8/20) | 6.33 s
+[Task 13/25]  Current/Best:   19.55/  20.75 GFLOPS | Progress: (12/20) | 9.49 s
+[Task 13/25]  Current/Best:   12.23/  20.75 GFLOPS | Progress: (16/20) | 13.01 s
+[Task 13/25]  Current/Best:   18.34/  20.75 GFLOPS | Progress: (20/20) | 15.35 s Done.
 
 [Task 14/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 14/25]  Current/Best:   13.69/  13.69 GFLOPS | Progress: (4/20) | 3.33 s
-[Task 14/25]  Current/Best:    6.00/  13.69 GFLOPS | Progress: (8/20) | 5.52 s
-[Task 14/25]  Current/Best:   20.86/  20.86 GFLOPS | Progress: (12/20) | 8.20 s
-[Task 14/25]  Current/Best:   16.07/  20.86 GFLOPS | Progress: (16/20) | 9.85 s Done.
+[Task 14/25]  Current/Best:   13.67/  13.67 GFLOPS | Progress: (4/20) | 3.42 s
+[Task 14/25]  Current/Best:    6.05/  13.67 GFLOPS | Progress: (8/20) | 5.61 s
+[Task 14/25]  Current/Best:   20.34/  20.34 GFLOPS | Progress: (12/20) | 8.32 s
+[Task 14/25]  Current/Best:   15.83/  20.34 GFLOPS | Progress: (16/20) | 9.98 s Done.
 
-[Task 14/25]  Current/Best:   17.35/  20.86 GFLOPS | Progress: (20/20) | 11.52 s
+[Task 14/25]  Current/Best:   17.61/  20.34 GFLOPS | Progress: (20/20) | 11.75 s
 [Task 15/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 15/25]  Current/Best:   16.13/  17.67 GFLOPS | Progress: (4/20) | 2.59 s
-[Task 15/25]  Current/Best:   14.40/  18.07 GFLOPS | Progress: (8/20) | 3.94 s
-[Task 15/25]  Current/Best:   10.39/  22.37 GFLOPS | Progress: (12/20) | 6.15 s
-[Task 15/25]  Current/Best:   20.24/  22.37 GFLOPS | Progress: (16/20) | 9.27 s
-[Task 15/25]  Current/Best:    9.54/  22.37 GFLOPS | Progress: (20/20) | 10.29 s
+[Task 15/25]  Current/Best:   16.18/  17.60 GFLOPS | Progress: (4/20) | 2.66 s
+[Task 15/25]  Current/Best:   14.40/  17.60 GFLOPS | Progress: (8/20) | 3.95 s
+[Task 15/25]  Current/Best:   10.41/  22.34 GFLOPS | Progress: (12/20) | 6.20 s
+[Task 15/25]  Current/Best:   20.44/  22.34 GFLOPS | Progress: (16/20) | 9.54 s
+[Task 15/25]  Current/Best:    9.71/  22.34 GFLOPS | Progress: (20/20) | 10.55 s
 [Task 16/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 16/25]  Current/Best:   19.62/  19.62 GFLOPS | Progress: (4/20) | 2.81 s
-[Task 16/25]  Current/Best:    3.04/  19.62 GFLOPS | Progress: (8/20) | 4.41 s
-[Task 16/25]  Current/Best:   19.40/  19.62 GFLOPS | Progress: (12/20) | 5.63 s
-[Task 16/25]  Current/Best:   17.86/  19.62 GFLOPS | Progress: (16/20) | 7.01 s
-[Task 16/25]  Current/Best:    9.98/  22.44 GFLOPS | Progress: (20/20) | 9.15 s Done.
+[Task 16/25]  Current/Best:   20.66/  20.66 GFLOPS | Progress: (4/20) | 2.84 s
+[Task 16/25]  Current/Best:    3.04/  20.66 GFLOPS | Progress: (8/20) | 4.48 s
+[Task 16/25]  Current/Best:   19.72/  20.66 GFLOPS | Progress: (12/20) | 5.69 s
+[Task 16/25]  Current/Best:   17.87/  20.66 GFLOPS | Progress: (16/20) | 7.03 s
+[Task 16/25]  Current/Best:    9.99/  20.66 GFLOPS | Progress: (20/20) | 9.19 s Done.
 
 [Task 17/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 17/25]  Current/Best:   12.92/  18.67 GFLOPS | Progress: (4/20) | 4.70 s
-[Task 17/25]  Current/Best:   14.47/  23.46 GFLOPS | Progress: (8/20) | 7.54 s
-[Task 17/25]  Current/Best:   16.72/  23.46 GFLOPS | Progress: (12/20) | 9.58 s
-[Task 17/25]  Current/Best:   16.55/  23.46 GFLOPS | Progress: (16/20) | 11.80 s
-[Task 17/25]  Current/Best:   10.04/  23.46 GFLOPS | Progress: (20/20) | 13.93 s Done.
+[Task 17/25]  Current/Best:   13.11/  18.79 GFLOPS | Progress: (4/20) | 4.71 s
+[Task 17/25]  Current/Best:   13.83/  23.24 GFLOPS | Progress: (8/20) | 7.63 s
+[Task 17/25]  Current/Best:   17.17/  23.24 GFLOPS | Progress: (12/20) | 9.70 s
+[Task 17/25]  Current/Best:   16.61/  23.24 GFLOPS | Progress: (16/20) | 11.90 s
+[Task 17/25]  Current/Best:   10.02/  23.24 GFLOPS | Progress: (20/20) | 14.08 s Done.
 
 [Task 18/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 18/25]  Current/Best:   11.38/  16.89 GFLOPS | Progress: (4/20) | 3.72 s
-[Task 18/25]  Current/Best:   10.53/  20.06 GFLOPS | Progress: (8/20) | 7.32 s
-[Task 18/25]  Current/Best:   19.31/  20.06 GFLOPS | Progress: (12/20) | 9.24 s
-[Task 18/25]  Current/Best:   10.09/  20.06 GFLOPS | Progress: (16/20) | 13.07 s
-[Task 18/25]  Current/Best:   20.52/  20.52 GFLOPS | Progress: (20/20) | 14.59 s Done.
+[Task 18/25]  Current/Best:   11.32/  17.62 GFLOPS | Progress: (4/20) | 3.76 s
+[Task 18/25]  Current/Best:   10.51/  19.59 GFLOPS | Progress: (8/20) | 7.41 s
+[Task 18/25]  Current/Best:   19.40/  19.59 GFLOPS | Progress: (12/20) | 9.32 s
+[Task 18/25]  Current/Best:    9.84/  19.59 GFLOPS | Progress: (16/20) | 13.25 s
+[Task 18/25]  Current/Best:   20.69/  20.69 GFLOPS | Progress: (20/20) | 14.77 s Done.
 
 [Task 19/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 19/25]  Current/Best:    7.06/  20.42 GFLOPS | Progress: (4/20) | 6.04 s
-[Task 19/25]  Current/Best:    2.61/  20.42 GFLOPS | Progress: (8/20) | 9.43 s
-[Task 19/25]  Current/Best:   20.20/  21.87 GFLOPS | Progress: (12/20) | 12.41 s
-[Task 19/25]  Current/Best:   14.12/  21.87 GFLOPS | Progress: (16/20) | 15.44 s
-[Task 19/25]  Current/Best:    2.71/  23.80 GFLOPS | Progress: (20/20) | 18.26 s Done.
+[Task 19/25]  Current/Best:    6.99/  20.25 GFLOPS | Progress: (4/20) | 6.15 s
+[Task 19/25]  Current/Best:    2.61/  20.25 GFLOPS | Progress: (8/20) | 9.50 s
+[Task 19/25]  Current/Best:   18.96/  20.68 GFLOPS | Progress: (12/20) | 12.46 s
+[Task 19/25]  Current/Best:   13.57/  20.68 GFLOPS | Progress: (16/20) | 15.48 s
+[Task 19/25]  Current/Best:    2.70/  23.00 GFLOPS | Progress: (20/20) | 18.27 s Done.
 
 [Task 20/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 20/25]  Current/Best:    8.06/  15.05 GFLOPS | Progress: (4/20) | 3.24 s Done.
+[Task 20/25]  Current/Best:    8.89/  14.89 GFLOPS | Progress: (4/20) | 3.36 s Done.
  Done.
 
-[Task 20/25]  Current/Best:    9.82/  15.05 GFLOPS | Progress: (8/20) | 6.82 s
-[Task 20/25]  Current/Best:    2.32/  16.51 GFLOPS | Progress: (12/20) | 10.80 s
-[Task 20/25]  Current/Best:   12.40/  16.51 GFLOPS | Progress: (16/20) | 14.50 s
-[Task 20/25]  Current/Best:   11.61/  22.12 GFLOPS | Progress: (20/20) | 16.61 s
+[Task 20/25]  Current/Best:   10.36/  14.89 GFLOPS | Progress: (8/20) | 6.78 s
+[Task 20/25]  Current/Best:    2.32/  16.68 GFLOPS | Progress: (12/20) | 10.78 s
+[Task 20/25]  Current/Best:   12.55/  16.68 GFLOPS | Progress: (16/20) | 14.79 s
+[Task 20/25]  Current/Best:   13.47/  21.67 GFLOPS | Progress: (20/20) | 16.92 s
 [Task 21/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 21/25]  Current/Best:    6.42/  17.71 GFLOPS | Progress: (4/20) | 3.19 s
-[Task 21/25]  Current/Best:   14.68/  17.71 GFLOPS | Progress: (8/20) | 4.79 s
-[Task 21/25]  Current/Best:    1.61/  17.71 GFLOPS | Progress: (12/20) | 6.91 s
-[Task 21/25]  Current/Best:   18.19/  18.19 GFLOPS | Progress: (16/20) | 10.36 s
-[Task 21/25]  Current/Best:    4.46/  18.19 GFLOPS | Progress: (20/20) | 17.55 s
+[Task 21/25]  Current/Best:    6.37/  17.70 GFLOPS | Progress: (4/20) | 3.25 s
+[Task 21/25]  Current/Best:   14.62/  17.70 GFLOPS | Progress: (8/20) | 4.87 s
+[Task 21/25]  Current/Best:    1.61/  17.70 GFLOPS | Progress: (12/20) | 7.01 s
+[Task 21/25]  Current/Best:   17.13/  17.70 GFLOPS | Progress: (16/20) | 10.53 s
+[Task 21/25]  Current/Best:    4.47/  17.70 GFLOPS | Progress: (20/20) | 18.00 s
 [Task 22/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 22/25]  Current/Best:    2.71/  17.01 GFLOPS | Progress: (4/20) | 2.59 s
-[Task 22/25]  Current/Best:    8.76/  21.75 GFLOPS | Progress: (8/20) | 4.63 s
-[Task 22/25]  Current/Best:   20.02/  21.75 GFLOPS | Progress: (12/20) | 6.98 s
-[Task 22/25]  Current/Best:   15.30/  21.75 GFLOPS | Progress: (16/20) | 9.13 s
-[Task 22/25]  Current/Best:   14.45/  21.75 GFLOPS | Progress: (20/20) | 10.84 s Done.
+[Task 22/25]  Current/Best:    2.70/  17.05 GFLOPS | Progress: (4/20) | 2.65 s
+[Task 22/25]  Current/Best:    8.66/  21.52 GFLOPS | Progress: (8/20) | 4.68 s
+[Task 22/25]  Current/Best:   19.76/  21.52 GFLOPS | Progress: (12/20) | 7.04 s
+[Task 22/25]  Current/Best:   15.32/  21.52 GFLOPS | Progress: (16/20) | 9.17 s
+[Task 22/25]  Current/Best:   13.02/  21.52 GFLOPS | Progress: (20/20) | 10.92 s Done.
 
 [Task 23/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 23/25]  Current/Best:   17.68/  21.00 GFLOPS | Progress: (4/20) | 3.18 s
-[Task 23/25]  Current/Best:   13.70/  21.00 GFLOPS | Progress: (8/20) | 6.66 s
-[Task 23/25]  Current/Best:   21.02/  21.90 GFLOPS | Progress: (12/20) | 8.49 s
-[Task 23/25]  Current/Best:    6.51/  21.90 GFLOPS | Progress: (16/20) | 15.50 s
-[Task 23/25]  Current/Best:    7.96/  21.90 GFLOPS | Progress: (20/20) | 19.70 s Done.
+[Task 23/25]  Current/Best:   17.32/  20.80 GFLOPS | Progress: (4/20) | 3.18 s
+[Task 23/25]  Current/Best:   15.77/  20.80 GFLOPS | Progress: (8/20) | 6.65 s
+[Task 23/25]  Current/Best:   20.67/  21.24 GFLOPS | Progress: (12/20) | 8.51 s
+[Task 23/25]  Current/Best:    6.22/  21.24 GFLOPS | Progress: (16/20) | 15.63 s
+[Task 23/25]  Current/Best:    7.49/  21.24 GFLOPS | Progress: (20/20) | 19.90 s Done.
 
 [Task 24/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 24/25]  Current/Best:    8.52/   8.52 GFLOPS | Progress: (4/20) | 11.71 s
-[Task 24/25]  Current/Best:    3.77/   8.52 GFLOPS | Progress: (8/20) | 22.88 s
-[Task 24/25]  Current/Best:    4.48/   8.52 GFLOPS | Progress: (12/20) | 33.60 s Done.
+[Task 24/25]  Current/Best:    8.27/   8.27 GFLOPS | Progress: (4/20) | 11.75 s
+[Task 24/25]  Current/Best:    3.61/   8.27 GFLOPS | Progress: (8/20) | 22.96 s
+[Task 24/25]  Current/Best:    4.31/   8.27 GFLOPS | Progress: (12/20) | 33.68 s Done.
  Done.
 
-[Task 24/25]  Current/Best:    6.24/   8.84 GFLOPS | Progress: (16/20) | 39.53 s
-[Task 24/25]  Current/Best:    3.41/   8.89 GFLOPS | Progress: (20/20) | 45.50 s Done.
+[Task 24/25]  Current/Best:    6.07/   8.81 GFLOPS | Progress: (16/20) | 39.46 s
+[Task 24/25]  Current/Best:    3.32/   8.81 GFLOPS | Progress: (20/20) | 45.51 s Done.
 
 [Task 25/25]  Current/Best:    0.00/   0.00 GFLOPS | Progress: (0/20) | 0.00 s
-[Task 25/25]  Current/Best:    1.55/   2.75 GFLOPS | Progress: (4/20) | 11.53 s
-[Task 25/25]  Current/Best:    6.24/   8.03 GFLOPS | Progress: (8/20) | 22.76 s
-[Task 25/25]  Current/Best:    5.99/   8.03 GFLOPS | Progress: (12/20) | 34.01 s
-[Task 25/25]  Current/Best:    5.80/   8.79 GFLOPS | Progress: (16/20) | 35.87 s
-[Task 25/25]  Current/Best:    2.84/   9.40 GFLOPS | Progress: (20/20) | 46.56 s
+[Task 25/25]  Current/Best:    1.55/   2.94 GFLOPS | Progress: (4/20) | 11.55 s
+[Task 25/25]  Current/Best:    5.78/   8.61 GFLOPS | Progress: (8/20) | 22.77 s
+[Task 25/25]  Current/Best:    5.99/   8.61 GFLOPS | Progress: (12/20) | 34.23 s
+[Task 25/25]  Current/Best:    5.87/   8.89 GFLOPS | Progress: (16/20) | 36.12 s
+[Task 25/25]  Current/Best:    2.84/   8.89 GFLOPS | Progress: (20/20) | 46.85 s
 </pre></div>
 </div>
 <p>The output from this tuning process will look something like this:</p>
@@ -948,8 +948,8 @@ improvement in comparing the optimized model to the unoptimized model.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 411.31273892002355, &#39;median&#39;: 411.2051228000382, &#39;std&#39;: 0.8036804155483971}
-unoptimized: {&#39;mean&#39;: 494.0058553899962, &#39;median&#39;: 493.83366904999093, &#39;std&#39;: 0.6866634716120491}
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>optimized: {&#39;mean&#39;: 416.6099685100062, &#39;median&#39;: 416.9126989500228, &#39;std&#39;: 1.5312354032066489}
+unoptimized: {&#39;mean&#39;: 497.6316300500048, &#39;median&#39;: 497.7335548499923, &#39;std&#39;: 1.7081865449643931}
 </pre></div>
 </div>
 </div>
@@ -963,7 +963,7 @@ models.</p>
 <p>Here we presented a simple example using ResNet-50 v2 locally. However, TVM
 supports many more features including cross-compilation, remote execution and
 profiling/benchmarking.</p>
-<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  20.581 seconds)</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 10 minutes  25.060 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-autotvm-relay-x86-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/57a45d9bef1af358191e7d50043e652c/autotvm_relay_x86.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">autotvm_relay_x86.py</span></code></a></p>
diff --git a/docs/tutorial/cross_compilation_and_rpc.html b/docs/tutorial/cross_compilation_and_rpc.html
index f4862386a..c30951351 100644
--- a/docs/tutorial/cross_compilation_and_rpc.html
+++ b/docs/tutorial/cross_compilation_and_rpc.html
@@ -496,7 +496,7 @@ device and returns the measured cost. Network overhead is excluded.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.289e-07 secs/op
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>1.333e-07 secs/op
 </pre></div>
 </div>
 </div>
diff --git a/docs/tutorial/intro_topi.html b/docs/tutorial/intro_topi.html
index 8a20d3c3c..7164baae2 100644
--- a/docs/tutorial/intro_topi.html
+++ b/docs/tutorial/intro_topi.html
@@ -461,7 +461,7 @@ we can schedule the following series of operations ending with <code class="code
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0xc364c90)), stage(b, placeholder(b, 0xc320ce0)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[it [...]
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>[stage(a, placeholder(a, 0x1b0ab4b0)), stage(b, placeholder(b, 0x22413b70)), stage(T_add, compute(T_add, body=[(a[ax0, ax1, ax2] + b[ax1, ax2])], axis=[iter_var(ax0, range(min=0, ext=100)), iter_var(ax1, range(min=0, ext=10)), iter_var(ax2, range(min=0, ext=10))], reduce_axis=[], tag=broadcast, attrs={})), stage(T_multiply, compute(T_multiply, body=[(a[ax0, ax1, ax2]*b[ax1, ax2])], axis=[ [...]
 </pre></div>
 </div>
 <p>We can test the correctness by comparing with <code class="code docutils literal notranslate"><span class="pre">numpy</span></code> result as follows</p>
diff --git a/docs/tutorial/sg_execution_times.html b/docs/tutorial/sg_execution_times.html
index 3c35561ec..23b59bb90 100644
--- a/docs/tutorial/sg_execution_times.html
+++ b/docs/tutorial/sg_execution_times.html
@@ -300,20 +300,20 @@
             
   <div class="section" id="computation-times">
 <span id="sphx-glr-tutorial-sg-execution-times"></span><h1>Computation times<a class="headerlink" href="#computation-times" title="Permalink to this headline">¶</a></h1>
-<p><strong>13:19.803</strong> total execution time for <strong>tutorial</strong> files:</p>
+<p><strong>13:08.446</strong> total execution time for <strong>tutorial</strong> files:</p>
 <ul class="simple">
-<li><p><strong>10:20.581</strong>: <a class="reference internal" href="autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py"><span class="std std-ref">Compiling and Optimizing a Model with the Python Interface (AutoTVM)</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_relay_x86.py</span></code>)</p></li>
-<li><p><strong>01:07.017</strong>: <a class="reference internal" href="auto_scheduler_matmul_x86.html#sphx-glr-tutorial-auto-scheduler-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Auto-scheduling</span></a> (<code class="docutils literal notranslate"><span class="pre">auto_scheduler_matmul_x86.py</span></code>)</p></li>
-<li><p><strong>00:59.416</strong>: <a class="reference internal" href="tensor_expr_get_started.html#sphx-glr-tutorial-tensor-expr-get-started-py"><span class="std std-ref">Working with Operators Using Tensor Expression</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_expr_get_started.py</span></code>)</p></li>
-<li><p><strong>00:27.995</strong>: <a class="reference internal" href="relay_quick_start.html#sphx-glr-tutorial-relay-quick-start-py"><span class="std std-ref">Quick Start Tutorial for Compiling Deep Learning Models</span></a> (<code class="docutils literal notranslate"><span class="pre">relay_quick_start.py</span></code>)</p></li>
-<li><p><strong>00:23.214</strong>: <a class="reference internal" href="autotvm_matmul_x86.html#sphx-glr-tutorial-autotvm-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Schedule Templates and AutoTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_matmul_x86.py</span></code>)</p></li>
-<li><p><strong>00:00.715</strong>: <a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></li>
-<li><p><strong>00:00.536</strong>: <a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></li>
-<li><p><strong>00:00.191</strong>: <a class="reference internal" href="cross_compilation_and_rpc.html#sphx-glr-tutorial-cross-compilation-and-rpc-py"><span class="std std-ref">Cross Compilation and RPC</span></a> (<code class="docutils literal notranslate"><span class="pre">cross_compilation_and_rpc.py</span></code>)</p></li>
-<li><p><strong>00:00.038</strong>: <a class="reference internal" href="introduction.html#sphx-glr-tutorial-introduction-py"><span class="std std-ref">Introduction</span></a> (<code class="docutils literal notranslate"><span class="pre">introduction.py</span></code>)</p></li>
-<li><p><strong>00:00.035</strong>: <a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></li>
-<li><p><strong>00:00.033</strong>: <a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></li>
-<li><p><strong>00:00.032</strong>: <a class="reference internal" href="install.html#sphx-glr-tutorial-install-py"><span class="std std-ref">Installing TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">install.py</span></code>)</p></li>
+<li><p><strong>10:25.060</strong>: <a class="reference internal" href="autotvm_relay_x86.html#sphx-glr-tutorial-autotvm-relay-x86-py"><span class="std std-ref">Compiling and Optimizing a Model with the Python Interface (AutoTVM)</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_relay_x86.py</span></code>)</p></li>
+<li><p><strong>01:00.791</strong>: <a class="reference internal" href="tensor_expr_get_started.html#sphx-glr-tutorial-tensor-expr-get-started-py"><span class="std std-ref">Working with Operators Using Tensor Expression</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_expr_get_started.py</span></code>)</p></li>
+<li><p><strong>00:49.167</strong>: <a class="reference internal" href="auto_scheduler_matmul_x86.html#sphx-glr-tutorial-auto-scheduler-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Auto-scheduling</span></a> (<code class="docutils literal notranslate"><span class="pre">auto_scheduler_matmul_x86.py</span></code>)</p></li>
+<li><p><strong>00:28.130</strong>: <a class="reference internal" href="relay_quick_start.html#sphx-glr-tutorial-relay-quick-start-py"><span class="std std-ref">Quick Start Tutorial for Compiling Deep Learning Models</span></a> (<code class="docutils literal notranslate"><span class="pre">relay_quick_start.py</span></code>)</p></li>
+<li><p><strong>00:23.603</strong>: <a class="reference internal" href="autotvm_matmul_x86.html#sphx-glr-tutorial-autotvm-matmul-x86-py"><span class="std std-ref">Optimizing Operators with Schedule Templates and AutoTVM</span></a> (<code class="docutils literal notranslate"><span class="pre">autotvm_matmul_x86.py</span></code>)</p></li>
+<li><p><strong>00:00.750</strong>: <a class="reference internal" href="intro_topi.html#sphx-glr-tutorial-intro-topi-py"><span class="std std-ref">Introduction to TOPI</span></a> (<code class="docutils literal notranslate"><span class="pre">intro_topi.py</span></code>)</p></li>
+<li><p><strong>00:00.568</strong>: <a class="reference internal" href="tensor_ir_blitz_course.html#sphx-glr-tutorial-tensor-ir-blitz-course-py"><span class="std std-ref">Blitz Course to TensorIR</span></a> (<code class="docutils literal notranslate"><span class="pre">tensor_ir_blitz_course.py</span></code>)</p></li>
+<li><p><strong>00:00.211</strong>: <a class="reference internal" href="cross_compilation_and_rpc.html#sphx-glr-tutorial-cross-compilation-and-rpc-py"><span class="std std-ref">Cross Compilation and RPC</span></a> (<code class="docutils literal notranslate"><span class="pre">cross_compilation_and_rpc.py</span></code>)</p></li>
+<li><p><strong>00:00.043</strong>: <a class="reference internal" href="introduction.html#sphx-glr-tutorial-introduction-py"><span class="std std-ref">Introduction</span></a> (<code class="docutils literal notranslate"><span class="pre">introduction.py</span></code>)</p></li>
+<li><p><strong>00:00.042</strong>: <a class="reference internal" href="install.html#sphx-glr-tutorial-install-py"><span class="std std-ref">Installing TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">install.py</span></code>)</p></li>
+<li><p><strong>00:00.041</strong>: <a class="reference internal" href="tvmc_command_line_driver.html#sphx-glr-tutorial-tvmc-command-line-driver-py"><span class="std std-ref">Compiling and Optimizing a Model with TVMC</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_command_line_driver.py</span></code>)</p></li>
+<li><p><strong>00:00.039</strong>: <a class="reference internal" href="tvmc_python.html#sphx-glr-tutorial-tvmc-python-py"><span class="std std-ref">Getting Starting using TVMC Python: a high-level API for TVM</span></a> (<code class="docutils literal notranslate"><span class="pre">tvmc_python.py</span></code>)</p></li>
 </ul>
 </div>
 
diff --git a/docs/tutorial/tensor_expr_get_started.html b/docs/tutorial/tensor_expr_get_started.html
index 9ce453960..14254e22f 100644
--- a/docs/tutorial/tensor_expr_get_started.html
+++ b/docs/tutorial/tensor_expr_get_started.html
@@ -512,8 +512,8 @@ helper function to run a profile of the TVM generated code.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.000007
-naive: 0.000008
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.000008
+naive: 0.000006
 </pre></div>
 </div>
 </div>
@@ -564,7 +564,7 @@ compile and run this new schedule with the parallel operation applied:</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000009
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallel: 0.000006
 </pre></div>
 </div>
 </div>
@@ -638,10 +638,10 @@ factor to be the number of threads on your CPU.</p>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Operator                  Timing             Performance
-   numpy    7.089529999575461e-06                    1.0
-   naive              7.9637e-06       1.123304365800961
-parallel              8.8924e-06      1.2543003556699102
-  vector    2.4648300000000003e-05     3.476718485072495
+   numpy    8.059339997998904e-06                    1.0
+   naive    5.967000000000001e-06     0.7403832077417728
+parallel    6.3133999999999996e-06    0.7833643947975376
+  vector    2.4610499999999998e-05    3.0536619631521527
 </pre></div>
 </div>
 <div class="admonition-code-specialization admonition">
@@ -959,7 +959,7 @@ matrix multiplication.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.017892
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>Numpy running time: 0.018763
 </pre></div>
 </div>
 <p>Now we write a basic matrix multiplication using TVM TE and verify that it
@@ -1003,7 +1003,7 @@ optimizations.</p>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>/workspace/python/tvm/driver/build_module.py:264: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
   &quot;target_host parameter is going to be deprecated. &quot;
-none: 3.300033
+none: 3.382148
 </pre></div>
 </div>
 <p>Let’s take a look at the intermediate representation of the operator and
@@ -1070,7 +1070,7 @@ schedule.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.300489
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>blocking: 0.297685
 </pre></div>
 </div>
 <p>By reordering the computation to take advantage of caching, you should see a
@@ -1131,7 +1131,7 @@ already cache friendly from our previous optimizations.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.339779
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>vectorization: 0.337189
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1187,7 +1187,7 @@ more cache friendly.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.112024
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>loop permutation: 0.119975
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1264,7 +1264,7 @@ optimized schedule.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.108253
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>array packing: 0.110986
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1339,7 +1339,7 @@ to `C</cite> when all the block results are ready.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.110626
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>block caching: 0.111675
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1407,7 +1407,7 @@ of thread-level parallelization.</p>
 </pre></div>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
-<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.145039
+<div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>parallelization: 0.144948
 @main = primfn(A_1: handle, B_1: handle, C_1: handle) -&gt; ()
   attr = {&quot;from_legacy_te_schedule&quot;: True, &quot;global_symbol&quot;: &quot;main&quot;, &quot;tir.noalias&quot;: True}
   buffers = {A: Buffer(A_2: Pointer(float32), float32, [1048576], []),
@@ -1470,13 +1470,13 @@ working, we can compare the results.</p>
 </div>
 <p class="sphx-glr-script-out">Out:</p>
 <div class="sphx-glr-script-out highlight-none notranslate"><div class="highlight"><pre><span></span>        Operator                  Timing             Performance
-            none      3.3000331350000005                     1.0
-        blocking            0.3004887988      0.0910562974695707
-   vectorization     0.33977934639999996     0.10296240446688724
-loop permutation            0.1120241604    0.033946374420267746
-   array packing     0.10825315049999999    0.032803655621476045
-   block caching            0.1106262238    0.033522761522211196
- parallelization            0.1450390134     0.04395077487608317
+            none      3.3821478262999998                     1.0
+        blocking            0.2976846373      0.0880164477096972
+   vectorization            0.3371885208     0.09969656505785476
+loop permutation     0.11997511350000001     0.03547305430207949
+   array packing     0.11098600200000001     0.03281524276879891
+   block caching     0.11167461740000002     0.03301884575582545
+ parallelization            0.1449481649    0.042856839010070806
 </pre></div>
 </div>
 <p>Note that the outputs on the web page reflect the running times on a
@@ -1508,6 +1508,7 @@ is</p>
 you can build generic templates of the matrix multiplication and other
 operations with tunable parameters that allows you to automatically optimize
 the computation for specific platforms.</p>
+<p class="sphx-glr-timing"><strong>Total running time of the script:</strong> ( 1 minutes  0.791 seconds)</p>
 <div class="sphx-glr-footer class sphx-glr-footer-example docutils container" id="sphx-glr-download-tutorial-tensor-expr-get-started-py">
 <div class="sphx-glr-download docutils container">
 <p><a class="reference download internal" download="" href="../_downloads/40a01cffb015a67aaec0fad7e27cf80d/tensor_expr_get_started.py"><code class="xref download docutils literal notranslate"><span class="pre">Download</span> <span class="pre">Python</span> <span class="pre">source</span> <span class="pre">code:</span> <span class="pre">tensor_expr_get_started.py</span></code></a></p>