You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/01 18:53:10 UTC
[GitHub] [tvm] comaniac commented on a change in pull request #7767: [docs] Getting Started with TVM: Auto Tuning with Python

comaniac commented on a change in pull request #7767:
URL: https://github.com/apache/tvm/pull/7767#discussion_r605865288



##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime

Review comment:
       ```suggestion
   from tvm.contrib import graph_executor
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array

Review comment:
       ```suggestion
   # Our input image is in HWC layout while ONNX expects CHW input, so convert the array
   ```
   It's a bit weird to say "NHWC" here because the image is only 3-dimension.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve

Review comment:
       ```suggestion
   it is done with the Python API. Upon completion of this section, we will have
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):

Review comment:
       Any reason of not using opt_level=3?

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------

Review comment:
       ```suggestion
   # ----------------------
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds

Review comment:
       1. Do we have a pointer to explain what those parameters for? If not, we may need to explain them here, especially for `min_repeat_ms`. It's not clear why this can be 0 on CPU, and it brings a question like what would be the ideal number for GPU.
   2. These parameters are for runner, so better to move down to put them together.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value
+# is set to a small number here in the interest of having faster
+# running time.
+tuning_option = {
+    "tuner": "xgb",
+    "trials": 10,
+    "early_stopping": 100,
+    "measure_option": autotvm.measure_option(
+        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
+    ),
+    "tuning_records": "resnet-50-v2-autotuning.json",
+}
+
+################################################################################
+# .. note:: Defining the Tuning Search Algorithm
+#
+#   By default this search is guided using an `XGBoost Grid` algorithm.
+#   Depending on your model complexity and amount of time avilable, you might
+#   want to choose a different algorithm.
+
+
+################################################################################
+# .. note:: Setting Tuning Parameters
+#
+#   In this example, in the interest of time, we set the number of trials and
+#   early stopping to 10. You will likely see more performance improvements if
+#   you set these values to be higher but this comes at the expense of time
+#   spent tuning. The number of trials required for convergence will vary
+#   depending on the specifics of the model and the target platform.
+
+# Identify the tasks that can be tuned, and iterate through them
+for i, task in enumerate(tasks):
+    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
+    tuner_obj = XGBTuner(task, loss_type="rank")
+    tuner_obj.tune(
+        n_trial=min(tuning_option["trials"], len(task.config_space)),
+        early_stopping=tuning_option["early_stopping"],
+        measure_option=tuning_option["measure_option"],
+        callbacks=[
+            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
+            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
+        ],
+    )
+
+################################################################################
+# The output from this tuning process will look something like this:
+#
+# .. code-block:: bash
+#
+#   # [Task  1/24]  Current/Best:   10.71/  21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done.
+#   # [Task  1/24]  Current/Best:    9.32/  24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done.
+#   # [Task  2/24]  Current/Best:   22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done.
+#   # [Task  3/24]  Current/Best:   32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done.
+#   # [Task  4/24]  Current/Best:   11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done.
+#   # [Task  5/24]  Current/Best:   23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done.
+#   # [Task  6/24]  Current/Best:   38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done.
+#   # [Task  7/24]  Current/Best:    4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done.
+#   # [Task  8/24]  Current/Best:   50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done.
+#   # [Task  9/24]  Current/Best:   50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done.
+#   # [Task 10/24]  Current/Best:   12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done.
+#   # [Task 11/24]  Current/Best:   62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done.
+#   # [Task 12/24]  Current/Best:   36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done.
+#   # [Task 13/24]  Current/Best:    7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done.
+#   # [Task 14/24]  Current/Best:   12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done.
+#   # [Task 15/24]  Current/Best:   31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done.
+#   # [Task 16/24]  Current/Best:   31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done.
+#   # [Task 17/24]  Current/Best:   41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done.
+#   # [Task 18/24]  Current/Best:   15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done.
+#   # [Task 19/24]  Current/Best:   15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done.
+#   # [Task 20/24]  Current/Best:   30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done.
+#   # [Task 21/24]  Current/Best:   46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done.
+#   # [Task 22/24]  Current/Best:   13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done.
+#   # [Task 23/24]  Current/Best:   53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done.
+#   # [Task 24/24]  Current/Best:   25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done.
+
+################################################################################
+# Compiling an Optimized Model with Tuning Data
+# ----------------------------------------------
+#
+# As an output of the tuning process above, we obtained the tuning records
+# stored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to
+# generate high performance code for the model on your specified target.
+#
+# Now that tuning data for the model has been collected, we can re-compile the
+# model using optimized operators to speed up our computations.
+
+with autotvm.apply_history_best(tuning_option["tuning_records"]):
+    with tvm.transform.PassContext(opt_level=3, config={}):
+        lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))

Review comment:
       ```suggestion
   module = graph_executor.GraphModule(lib["default"](dev))
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value
+# is set to a small number here in the interest of having faster
+# running time.
+tuning_option = {
+    "tuner": "xgb",
+    "trials": 10,
+    "early_stopping": 100,
+    "measure_option": autotvm.measure_option(
+        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
+    ),
+    "tuning_records": "resnet-50-v2-autotuning.json",
+}
+
+################################################################################
+# .. note:: Defining the Tuning Search Algorithm
+#
+#   By default this search is guided using an `XGBoost Grid` algorithm.
+#   Depending on your model complexity and amount of time avilable, you might
+#   want to choose a different algorithm.
+
+
+################################################################################
+# .. note:: Setting Tuning Parameters
+#
+#   In this example, in the interest of time, we set the number of trials and
+#   early stopping to 10. You will likely see more performance improvements if
+#   you set these values to be higher but this comes at the expense of time
+#   spent tuning. The number of trials required for convergence will vary
+#   depending on the specifics of the model and the target platform.
+
+# Identify the tasks that can be tuned, and iterate through them
+for i, task in enumerate(tasks):
+    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
+    tuner_obj = XGBTuner(task, loss_type="rank")
+    tuner_obj.tune(
+        n_trial=min(tuning_option["trials"], len(task.config_space)),
+        early_stopping=tuning_option["early_stopping"],
+        measure_option=tuning_option["measure_option"],
+        callbacks=[
+            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
+            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
+        ],
+    )
+
+################################################################################
+# The output from this tuning process will look something like this:
+#
+# .. code-block:: bash
+#
+#   # [Task  1/24]  Current/Best:   10.71/  21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done.
+#   # [Task  1/24]  Current/Best:    9.32/  24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done.
+#   # [Task  2/24]  Current/Best:   22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done.
+#   # [Task  3/24]  Current/Best:   32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done.
+#   # [Task  4/24]  Current/Best:   11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done.
+#   # [Task  5/24]  Current/Best:   23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done.
+#   # [Task  6/24]  Current/Best:   38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done.
+#   # [Task  7/24]  Current/Best:    4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done.
+#   # [Task  8/24]  Current/Best:   50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done.
+#   # [Task  9/24]  Current/Best:   50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done.
+#   # [Task 10/24]  Current/Best:   12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done.
+#   # [Task 11/24]  Current/Best:   62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done.
+#   # [Task 12/24]  Current/Best:   36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done.
+#   # [Task 13/24]  Current/Best:    7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done.
+#   # [Task 14/24]  Current/Best:   12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done.
+#   # [Task 15/24]  Current/Best:   31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done.
+#   # [Task 16/24]  Current/Best:   31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done.
+#   # [Task 17/24]  Current/Best:   41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done.
+#   # [Task 18/24]  Current/Best:   15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done.
+#   # [Task 19/24]  Current/Best:   15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done.
+#   # [Task 20/24]  Current/Best:   30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done.
+#   # [Task 21/24]  Current/Best:   46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done.
+#   # [Task 22/24]  Current/Best:   13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done.
+#   # [Task 23/24]  Current/Best:   53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done.
+#   # [Task 24/24]  Current/Best:   25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done.
+
+################################################################################
+# Compiling an Optimized Model with Tuning Data
+# ----------------------------------------------
+#
+# As an output of the tuning process above, we obtained the tuning records
+# stored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to
+# generate high performance code for the model on your specified target.
+#
+# Now that tuning data for the model has been collected, we can re-compile the
+# model using optimized operators to speed up our computations.
+
+with autotvm.apply_history_best(tuning_option["tuning_records"]):
+    with tvm.transform.PassContext(opt_level=3, config={}):

Review comment:
       Ack to my previous comment. If we use opt_level=3 here, the comparison to untuned model is unfair.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))

Review comment:
       ```suggestion
   module = graph_executor.GraphModule(lib["default"](dev))
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.

Review comment:
       ```suggestion
   # Graph Executor.
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension

Review comment:
       ```suggestion
   # Add the batch dimension, as we are expecting 4-dimensional input: NCHW.
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------

Review comment:
       ```suggestion
   # --------------------------------------
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for

Review comment:
       ```suggestion
   # unoptimized model and compare it to a tuned model later. To help account for
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.

Review comment:
       Do we have a pointer if readers are interested in loading models from other frameworks?

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we

Review comment:
       ```suggestion
   # predictions with it. To use TVM to run the model and make predictions, we
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------

Review comment:
       ```suggestion
   # --------------
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file

Review comment:
       ```suggestion
   # The results of these runs are stored in a tuning records file.
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of

Review comment:
       ```suggestion
   # inferences using our compiled module. In cases like this, we can make use of
   ```

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value

Review comment:
       100 is not sufficient anyways. It would be better to suggest the proper numbers for common devices such as CPU and GPU. We recommand 1500 for CPU, and 3000-4000 for GPU.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler

Review comment:
       We didn't use auto_scheduler in this tutorial, but we should have a tutorial for it as it outperforms AutoTVM and is getting stable.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value
+# is set to a small number here in the interest of having faster
+# running time.
+tuning_option = {
+    "tuner": "xgb",
+    "trials": 10,
+    "early_stopping": 100,
+    "measure_option": autotvm.measure_option(
+        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
+    ),
+    "tuning_records": "resnet-50-v2-autotuning.json",
+}
+
+################################################################################
+# .. note:: Defining the Tuning Search Algorithm
+#
+#   By default this search is guided using an `XGBoost Grid` algorithm.
+#   Depending on your model complexity and amount of time avilable, you might

Review comment:
       ```suggestion
   #   Depending on your model complexity and amount of time available, you might
   ```
   This seems a bit misleading to me. At least I cannot think of a situation that random/grid search would be better than XGBTuner.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value
+# is set to a small number here in the interest of having faster
+# running time.
+tuning_option = {
+    "tuner": "xgb",
+    "trials": 10,
+    "early_stopping": 100,
+    "measure_option": autotvm.measure_option(
+        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
+    ),
+    "tuning_records": "resnet-50-v2-autotuning.json",
+}
+
+################################################################################
+# .. note:: Defining the Tuning Search Algorithm
+#
+#   By default this search is guided using an `XGBoost Grid` algorithm.
+#   Depending on your model complexity and amount of time avilable, you might
+#   want to choose a different algorithm.
+
+
+################################################################################
+# .. note:: Setting Tuning Parameters
+#
+#   In this example, in the interest of time, we set the number of trials and
+#   early stopping to 10. You will likely see more performance improvements if
+#   you set these values to be higher but this comes at the expense of time
+#   spent tuning. The number of trials required for convergence will vary
+#   depending on the specifics of the model and the target platform.
+
+# Identify the tasks that can be tuned, and iterate through them

Review comment:
       ```suggestion
   # Tune the extracted tasks sequentially.
   ```
   There's no task identification logic here.

##########
File path: tutorials/get_started/auto_tuning_with_python.py
##########
@@ -0,0 +1,451 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compiling and Optimizing a Model with the Python AutoScheduler
+==============================================================
+**Author**:
+`Chris Hoge <https://github.com/hogepodge>`_
+
+In the `TVMC Tutorial <tvmc_command_line_driver>`_, we covered how to compile, run, and tune a
+pre-trained vision model, ResNet-50-v2 using the command line interface for
+TVM, TVMC. TVM is more that just a command-line tool though, it is an
+optimizing framework with APIs available for a number of different languages
+that gives you tremendous flexibility in working with machine learning models.
+
+In this tutorial we will cover the same ground we did with TVMC, but show how
+it is done with the Python API. Upon completion of this section, we will ahve
+used the Python API for TVM to accomplish the following tasks:
+
+* Compile a pre-trained ResNet 50 v2 model for the TVM runtime.
+* Run a real image through the compiled model, and interpret the output and model
+  performance.
+* Tune the model that model on a CPU using TVM.
+* Re-compile an optimized model using the tuning data collected by TVM.
+* Run the image through the optimized model, and compare the output and model
+  performance.
+
+The goal of this section is to give you an overview of TVM's capabilites and
+how to use them through the Python API.
+"""
+
+################################################################################
+# TVM is a deep learning compiler framework, with a number of different modules
+# available for working with deep learning models and operators. In this
+# tutorial we will work through how to load, compile, and optimize a model
+# using the Python API.
+#
+# We begin by importing a number of dependencies, including ``onnx`` for
+# loading and converting the model, helper utilities for downloading test data,
+# the Python Image Library for working with the image data, ``numpy`` for pre
+# and post-processing of the image data, the TVM Relay framework, and the TVM
+# Graph Runtime.
+
+import onnx
+from tvm.contrib.download import download_testdata
+from PIL import Image
+import numpy as np
+import tvm.relay as relay
+import tvm
+from tvm.contrib import graph_runtime
+
+################################################################################
+# Downloading and Loading the ONNX Model
+# ---------------------------------------------
+#
+# For this tutorial, we will be working with ResNet-50 v2. ResNet-50 is a
+# convolutional neural network that is 50-layers deep and designed to classify
+# images. The model we will be using has been pre-trained on more than a
+# million images with 1000 different classifications. The network has an input
+# image size of 224x224. If you are interested exploring more of how the
+# ResNet-50 model is structured, we recommend downloading `Netron
+# <https://netron.app>`, a freely available ML model viewer.
+#
+# TVM provides a helper library to download pre-trained models. By providing a
+# model URL, file name, and model type through the module, TVM will download
+# the model and save it to disk. For the instance of an ONNX model, you can
+# then load it into memory using the ONNX runtime.
+#
+
+model_url = "".join(
+    [
+        "https://github.com/onnx/models/raw/",
+        "master/vision/classification/resnet/model/",
+        "resnet50-v2-7.onnx",
+    ]
+)
+
+model_path = download_testdata(model_url, "resnet50-v2-7.onnx", module="onnx")
+onnx_model = onnx.load(model_path)
+
+################################################################################
+# Downloading, Preprocessing, and Loading the Test Image
+# ------------------------------------------------------
+#
+# Each model is particular when it comes to expected tensor shapes, formats and
+# data types. For this reason, most models require some pre and
+# post-processing, to ensure the input is valid and to interpret the output.
+# TVMC has adopted NumPy's ``.npz`` format for both input and output data.
+#
+# As input for this tutorial, we will use the image of a cat, but you can feel
+# free to substitute image for any of your choosing.
+#
+# .. image:: https://s3.amazonaws.com/model-server/inputs/kitten.jpg
+#    :height: 224px
+#    :width: 224px
+#    :align: center
+#
+# Download the image data, then convert it to a numpy array to use as an input to the model.
+
+img_url = "https://s3.amazonaws.com/model-server/inputs/kitten.jpg"
+img_path = download_testdata(img_url, "imagenet_cat.png", module="data")
+
+# Resize it to 224x224
+resized_image = Image.open(img_path).resize((224, 224))
+img_data = np.asarray(resized_image).astype("float32")
+
+# ONNX expects NCHW input, so convert the array
+img_data = np.transpose(img_data, (2, 0, 1))
+
+# Normalize according to the ImageNet input specification
+imagenet_mean = np.array([0.485, 0.456, 0.406])
+imagenet_stddev = np.array([0.229, 0.224, 0.225])
+norm_img_data = np.zeros(img_data.shape).astype("float32")
+for i in range(img_data.shape[0]):
+    norm_img_data[i, :, :] = (img_data[i, :, :] / 255 - imagenet_mean[i]) / imagenet_stddev[i]
+
+# Add the batch dimension
+img_data = np.expand_dims(norm_img_data, axis=0)
+
+###############################################################################
+# Compile the Model With Relay
+# ----------------------------
+#
+# The next step is to compile the ResNet model. We begin by importing the model
+# to relay using the `from_onnx` importer. We then build the model, with
+# standard optimizations, into a TVM library.  Finally, we create a TVM graph
+# runtime module from the library.
+
+target = "llvm"
+
+######################################################################
+# .. note:: Defining the Correct Target
+#
+#   Specifying the correct target can have a huge impact on the performance of
+#   the compiled module, as it can take advantage of hardware features
+#   available on the target. For more information, please refer to `Auto-tuning
+#   a convolutional network for x86 CPU
+#   <https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_x86.html#define-network>`_.
+#   We recommend identifying which CPU you are running, along with optional
+#   features, and set the target appropriately. For example, for some
+#   processors ``target = "llvm -mcpu=skylake"``, or ``target = "llvm
+#   -mcpu=skylake-avx512"`` for processors with the AVX-512 vector instruction
+#   set.
+#
+
+# The input name may vary across model types. You can use a tool
+# like netron to check input names
+input_name = "data"
+shape_dict = {input_name: img_data.shape}
+
+mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
+
+with tvm.transform.PassContext(opt_level=1):
+    lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+######################################################################
+# Execute on TVM Runtime
+# ----------------------
+# Now that we've compiled the model, we can use the TVM runtime to make
+# predictions with it.  To use TVM to run the model and make predictions, we
+# need two things:
+#
+# - The compiled model, which we just produced.
+# - Valid input to the model to make predictions on.
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+################################################################################
+# Collect Basic Performance Data
+# ------------------------------
+# We want to collect some basic performance data associated with this
+# unoptimized model and compare it to a tuned model later.  To help account for
+# CPU noise, we run the computation in multiple batches in multiple
+# repetitions, then gather some basis statistics on the mean, median, and
+# standard deviation.
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+unoptimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+unoptimized = {
+    "mean": np.mean(unoptimized),
+    "median": np.median(unoptimized),
+    "std": np.std(unoptimized),
+}
+
+print(unoptimized)
+
+################################################################################
+# Postprocess the output
+# ---------------------------------------------
+#
+# As previously mentioned, each model will have its own particular way of
+# providing output tensors.
+#
+# In our case, we need to run some post-processing to render the outputs from
+# ResNet-50-V2 into a more human-readable form, using the lookup-table provided
+# for the model.
+
+from scipy.special import softmax
+
+# Download a list of labels
+labels_url = "https://s3.amazonaws.com/onnx-model-zoo/synset.txt"
+labels_path = download_testdata(labels_url, "synset.txt", module="data")
+
+with open(labels_path, "r") as f:
+    labels = [l.rstrip() for l in f]
+
+# Open the output and read the output tensor
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+################################################################################
+# This should produce the following output:
+#
+# .. code-block:: bash
+#
+#     # class='n02123045 tabby, tabby cat' with probability=0.610553
+#     # class='n02123159 tiger cat' with probability=0.367179
+#     # class='n02124075 Egyptian cat' with probability=0.019365
+#     # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#     # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Tune the model
+# ---------------------------------------------
+# The previous model was compiled to work on the TVM runtime, but did not
+# include any platform specific optimization. In this section, we will show you
+# how to build an optimized model using TVM to target your working platform.
+#
+# In some cases, we might not get the expected performance when running
+# inferences using our compiled module.  In cases like this, we can make use of
+# the auto-tuner, to find a better configuration for our model and get a boost
+# in performance. Tuning in TVM refers to the process by which a model is
+# optimized to run faster on a given target. This differs from training or
+# fine-tuning in that it does not affect the accuracy of the model, but only
+# the runtime performance. As part of the tuning process, TVM will try running
+# many different operator implementation variants to see which perform best.
+# The results of these runs are stored in a tuning  records file
+#
+# In the simplest form, tuning requires you to provide three things:
+#
+# - the target specification of the device you intend to run this model on
+# - the path to an output file in which the tuning records will be stored
+# - a path to the model to be tuned.
+#
+
+import tvm.auto_scheduler as auto_scheduler
+from tvm.autotvm.tuner import XGBTuner
+from tvm import autotvm
+
+# set up some basic parameters
+number = 10
+repeat = 1
+min_repeat_ms = 0  # since we're tuning on a CPU, can be set to 0
+timeout = 10  # in seconds
+
+# begin by extracting the taks from the onnx model
+tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)
+
+# create a TVM runner
+runner = autotvm.LocalRunner(
+    number=number,
+    repeat=repeat,
+    timeout=timeout,
+    min_repeat_ms=min_repeat_ms,
+)
+
+# create a simple structure for holding tuning options
+# For a production job, you will want to set the number of trials to
+# be larger, at least 100. Tuning can be time intensive, so the value
+# is set to a small number here in the interest of having faster
+# running time.
+tuning_option = {
+    "tuner": "xgb",
+    "trials": 10,
+    "early_stopping": 100,
+    "measure_option": autotvm.measure_option(
+        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
+    ),
+    "tuning_records": "resnet-50-v2-autotuning.json",
+}
+
+################################################################################
+# .. note:: Defining the Tuning Search Algorithm
+#
+#   By default this search is guided using an `XGBoost Grid` algorithm.
+#   Depending on your model complexity and amount of time avilable, you might
+#   want to choose a different algorithm.
+
+
+################################################################################
+# .. note:: Setting Tuning Parameters
+#
+#   In this example, in the interest of time, we set the number of trials and
+#   early stopping to 10. You will likely see more performance improvements if
+#   you set these values to be higher but this comes at the expense of time
+#   spent tuning. The number of trials required for convergence will vary
+#   depending on the specifics of the model and the target platform.
+
+# Identify the tasks that can be tuned, and iterate through them
+for i, task in enumerate(tasks):
+    prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
+    tuner_obj = XGBTuner(task, loss_type="rank")
+    tuner_obj.tune(
+        n_trial=min(tuning_option["trials"], len(task.config_space)),
+        early_stopping=tuning_option["early_stopping"],
+        measure_option=tuning_option["measure_option"],
+        callbacks=[
+            autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
+            autotvm.callback.log_to_file(tuning_option["tuning_records"]),
+        ],
+    )
+
+################################################################################
+# The output from this tuning process will look something like this:
+#
+# .. code-block:: bash
+#
+#   # [Task  1/24]  Current/Best:   10.71/  21.08 GFLOPS | Progress: (60/1000) | 111.77 s Done.
+#   # [Task  1/24]  Current/Best:    9.32/  24.18 GFLOPS | Progress: (192/1000) | 365.02 s Done.
+#   # [Task  2/24]  Current/Best:   22.39/ 177.59 GFLOPS | Progress: (960/1000) | 976.17 s Done.
+#   # [Task  3/24]  Current/Best:   32.03/ 153.34 GFLOPS | Progress: (800/1000) | 776.84 s Done.
+#   # [Task  4/24]  Current/Best:   11.96/ 156.49 GFLOPS | Progress: (960/1000) | 632.26 s Done.
+#   # [Task  5/24]  Current/Best:   23.75/ 130.78 GFLOPS | Progress: (800/1000) | 739.29 s Done.
+#   # [Task  6/24]  Current/Best:   38.29/ 198.31 GFLOPS | Progress: (1000/1000) | 624.51 s Done.
+#   # [Task  7/24]  Current/Best:    4.31/ 210.78 GFLOPS | Progress: (1000/1000) | 701.03 s Done.
+#   # [Task  8/24]  Current/Best:   50.25/ 185.35 GFLOPS | Progress: (972/1000) | 538.55 s Done.
+#   # [Task  9/24]  Current/Best:   50.19/ 194.42 GFLOPS | Progress: (1000/1000) | 487.30 s Done.
+#   # [Task 10/24]  Current/Best:   12.90/ 172.60 GFLOPS | Progress: (972/1000) | 607.32 s Done.
+#   # [Task 11/24]  Current/Best:   62.71/ 203.46 GFLOPS | Progress: (1000/1000) | 581.92 s Done.
+#   # [Task 12/24]  Current/Best:   36.79/ 224.71 GFLOPS | Progress: (1000/1000) | 675.13 s Done.
+#   # [Task 13/24]  Current/Best:    7.76/ 219.72 GFLOPS | Progress: (1000/1000) | 519.06 s Done.
+#   # [Task 14/24]  Current/Best:   12.26/ 202.42 GFLOPS | Progress: (1000/1000) | 514.30 s Done.
+#   # [Task 15/24]  Current/Best:   31.59/ 197.61 GFLOPS | Progress: (1000/1000) | 558.54 s Done.
+#   # [Task 16/24]  Current/Best:   31.63/ 206.08 GFLOPS | Progress: (1000/1000) | 708.36 s Done.
+#   # [Task 17/24]  Current/Best:   41.18/ 204.45 GFLOPS | Progress: (1000/1000) | 736.08 s Done.
+#   # [Task 18/24]  Current/Best:   15.85/ 222.38 GFLOPS | Progress: (980/1000) | 516.73 s Done.
+#   # [Task 19/24]  Current/Best:   15.78/ 203.41 GFLOPS | Progress: (1000/1000) | 587.13 s Done.
+#   # [Task 20/24]  Current/Best:   30.47/ 205.92 GFLOPS | Progress: (980/1000) | 471.00 s Done.
+#   # [Task 21/24]  Current/Best:   46.91/ 227.99 GFLOPS | Progress: (308/1000) | 219.18 s Done.
+#   # [Task 22/24]  Current/Best:   13.33/ 207.66 GFLOPS | Progress: (1000/1000) | 761.74 s Done.
+#   # [Task 23/24]  Current/Best:   53.29/ 192.98 GFLOPS | Progress: (1000/1000) | 799.90 s Done.
+#   # [Task 24/24]  Current/Best:   25.03/ 146.14 GFLOPS | Progress: (1000/1000) | 1112.55 s Done.
+
+################################################################################
+# Compiling an Optimized Model with Tuning Data
+# ----------------------------------------------
+#
+# As an output of the tuning process above, we obtained the tuning records
+# stored in ``resnet-50-v2-autotuning.json``. The compiler will use the results to
+# generate high performance code for the model on your specified target.
+#
+# Now that tuning data for the model has been collected, we can re-compile the
+# model using optimized operators to speed up our computations.
+
+with autotvm.apply_history_best(tuning_option["tuning_records"]):
+    with tvm.transform.PassContext(opt_level=3, config={}):
+        lib = relay.build(mod, target=target, params=params)
+
+dev = tvm.device(str(target), 0)
+module = graph_runtime.GraphModule(lib["default"](dev))
+
+################################################################################
+# Verify that the optimized model runs and produces the same results:
+
+dtype = "float32"
+module.set_input(input_name, img_data)
+module.run()
+output_shape = (1, 1000)
+tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).asnumpy()
+
+scores = softmax(tvm_output)
+scores = np.squeeze(scores)
+ranks = np.argsort(scores)[::-1]
+for rank in ranks[0:5]:
+    print("class='%s' with probability=%f" % (labels[rank], scores[rank]))
+
+# Verifying that the predictions are the same:
+#
+# .. code-block:: bash
+#
+#   # class='n02123045 tabby, tabby cat' with probability=0.610550
+#   # class='n02123159 tiger cat' with probability=0.367181
+#   # class='n02124075 Egyptian cat' with probability=0.019365
+#   # class='n02129604 tiger, Panthera tigris' with probability=0.001273
+#   # class='n04040759 radiator' with probability=0.000261
+
+################################################################################
+# Comparing the Tuned and Untuned Models
+# --------------------------------------
+# We want to collect some basic performance data associated with this optimized
+# model to compare it to the unoptimized model. Depending on your underlying
+# hardware, number of iterations, and other factors, you should see a performance
+# improvement in comparing the optimized model to the unoptimized model.
+
+import timeit
+
+timing_number = 10
+timing_repeat = 10
+optimized = (
+    np.array(timeit.Timer(lambda: module.run()).repeat(repeat=timing_repeat, number=timing_number))
+    * 1000
+    / timing_number
+)
+optimized = {"mean": np.mean(optimized), "median": np.median(optimized), "std": np.std(optimized)}
+
+
+print("optimized: %s" % (optimized))
+print("unoptimized: %s" % (unoptimized))
+
+################################################################################
+# Final Remarks
+# -------------
+#
+# In this tutorial, we we gave a short example of how to use the TVM Python API
+# to compile, run, and tune a model.  We also discussed the need for pre and

Review comment:
       ```suggestion
   # to compile, run, and tune a model. We also discussed the need for pre and
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org