You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/09/01 02:29:22 UTC
[GitHub] [tvm] masahi commented on a diff in pull request #12318: [TVM PyTorch Integration] optimized_torch & as_torch how-to guide

masahi commented on code in PR #12318:
URL: https://github.com/apache/tvm/pull/12318#discussion_r960153711


##########
gallery/how_to/work_with_pytorch/using_as_torch.py:
##########
@@ -0,0 +1,172 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Wrap Your TVMscript with PyTorch Module
+======================
+**Author**: 
+`Yaoda Zhou <https://github.com/juda>`_,
+`Masahiro Masuda <https://github.com/masahi>`_
+
+This article is an introductory tutorial on wrapping the TVMscript code with the PyTorch module.
+By the decorator `as_torch`, users can wrap a TVMscript code into a PyTorch nn.Module naturally.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch, as well as necessary libraries
+import torch
+import torch.nn.functional as F
+import torch.utils.benchmark as benchmark
+
+import tvm
+from tvm.contrib.torch import as_torch
+from tvm.script import tir as T
+
+######################################################################
+# Write your own PyTorch operator by TVMscript
+# -------------------------------
+# PyTorch is a very popular machine learning framework which contains
+# optimized implementations of most commonly used operators.
+# Nevertheless, sometimes you might want to write your own operators in PyTorch.
+# In that case, the performance of such custom operators might not be satisfactory for your needs.
+#
+# One of the examples is to define a 1-d depthwise convolution operator.
+# Assume the number of in_channel and out_channel are both 70,
+# the width is 80 and the kernel size is 20,
+# then the 1-d depthwise conv could be written in PyTorch in one line:
+
+in_channel = 70
+out_channel = 70
+width = 80
+kernel_size = 20
+
+
+def torch_depthwise(inputs, filters):
+    return F.conv1d(inputs, filters.view(out_channel, 1, kernel_size), groups=out_channel)
+
+
+# We can run this function as:
+
+inputs = torch.randn(in_channel, width)
+filters = torch.randn(out_channel, kernel_size)
+ret_torch = torch_depthwise(inputs, filters)
+
+# The `torch_depthwise` function, in a plain Python code, could be written as:
+
+
+def vanilla_depthwise(input, weight):
+    ret = torch.zeros(out_channel, width - kernel_size + 1)
+    for j in range(out_channel):
+        for i in range(width - kernel_size + 1):
+            for k in range(kernel_size):
+                ret[j, i] += weight[j, k] * input[j, i + k]
+    return ret
+
+
+# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
+# TVM community proposes an embedded Domain Specific Language on Python call TVMscript,
+# which serves for an abstraction of program on various hardware backends.

Review Comment:
   I think calling TVMScript as "an abstraction of program on various hardware backends" is a bit long shot. I think it is a much more high-level, concrete thing.  



##########
gallery/how_to/work_with_pytorch/using_as_torch.py:
##########
@@ -0,0 +1,172 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Wrap Your TVMscript with PyTorch Module
+======================
+**Author**: 
+`Yaoda Zhou <https://github.com/juda>`_,
+`Masahiro Masuda <https://github.com/masahi>`_
+
+This article is an introductory tutorial on wrapping the TVMscript code with the PyTorch module.
+By the decorator `as_torch`, users can wrap a TVMscript code into a PyTorch nn.Module naturally.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch, as well as necessary libraries
+import torch
+import torch.nn.functional as F
+import torch.utils.benchmark as benchmark
+
+import tvm
+from tvm.contrib.torch import as_torch
+from tvm.script import tir as T
+
+######################################################################
+# Write your own PyTorch operator by TVMscript
+# -------------------------------
+# PyTorch is a very popular machine learning framework which contains
+# optimized implementations of most commonly used operators.
+# Nevertheless, sometimes you might want to write your own operators in PyTorch.
+# In that case, the performance of such custom operators might not be satisfactory for your needs.
+#
+# One of the examples is to define a 1-d depthwise convolution operator.
+# Assume the number of in_channel and out_channel are both 70,
+# the width is 80 and the kernel size is 20,
+# then the 1-d depthwise conv could be written in PyTorch in one line:
+
+in_channel = 70
+out_channel = 70
+width = 80
+kernel_size = 20
+
+
+def torch_depthwise(inputs, filters):
+    return F.conv1d(inputs, filters.view(out_channel, 1, kernel_size), groups=out_channel)
+
+
+# We can run this function as:
+
+inputs = torch.randn(in_channel, width)
+filters = torch.randn(out_channel, kernel_size)
+ret_torch = torch_depthwise(inputs, filters)
+
+# The `torch_depthwise` function, in a plain Python code, could be written as:
+
+
+def vanilla_depthwise(input, weight):
+    ret = torch.zeros(out_channel, width - kernel_size + 1)
+    for j in range(out_channel):
+        for i in range(width - kernel_size + 1):
+            for k in range(kernel_size):
+                ret[j, i] += weight[j, k] * input[j, i + k]
+    return ret
+
+
+# Then, we plan to optimize the `depthwise` function by leveraging the power of TVM.
+# TVM community proposes an embedded Domain Specific Language on Python call TVMscript,
+# which serves for an abstraction of program on various hardware backends.
+
+# As a concrete example, we can write such a TVMscript for 1-d depthwise conv code as below.
+# The computation procedure of `tvm_depthwise` is corresponding to the code snippet of `vanilla_depthwise`.
+
+# In our `tvm_depthwise` function, both inputs and outputs are set to be function parameters
+# that held on the multi-dimension buffers. For each buffer, the shape and data type information are required.
+# In the function body, there is a syntactic sugar `T.grid` for writing multiple nested iterators.
+# In the body of the loop, each computation is wrapped in an additional construct named `T.block`.
+# A block is a basic unit of computation. Inside the block, we need to provide a few more information about the block axes.
+# Here, 2 spatial and 1 reduce block iterators are created and bound to the loop iterators i, j and k.
+# The computations and machine learning compilation analysis will be defined around them.
+# The last 3 lines are computation statements, including an initialization of `C[vj, vi]` and the summing up along the axis k.
+# Finally, we place 2 decorators `T.prim_func` and `as_torch` above the definition of function,
+# which converts the Python AST to TVMscript AST and then converts to PyTorch's `nn.Module`.

Review Comment:
   These sentences might be too detailed for a tutorial intended for PT users. I prefer a more succinct summary of what TVMScript is about, not necessarily explaining all the syntactic constructs used in the example.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)

Review Comment:
   meta would mean "Facebook" to most PT users.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.

Review Comment:
   deploy -> tune



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)

Review Comment:
   I think it's more important to compare result from (1) vanilla PT and (2) after conversion to TVM.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.

Review Comment:
   I don't know what you are trying to say here. How about just say: "To optimize the resnet18 model by MetaSchedule, you can apply ""optimize_torch" function.".



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.
+meta_module_resnet18 = MyResNet18(tuning_config, target_cuda)
+
+
+######################################################################
+# Define the resnet18 optimized by TorchScript
+# ------------------------------
+# Besides, let us define a resnet18 model in a standard way.
+# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
+# we will compare the performance of those two optimizers later.
+
+
+class JitModule(torch.nn.Module):
+    def __init__(self):
+        super(JitModule, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_for_inference` function
+        self.resnet = torch.jit.optimize_for_inference(torch.jit.script(resnet18().cuda().eval()))
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+jit_module_resnet18 = JitModule()
+
+######################################################################
+# Compare the performance between two scheduling approaches.
+# ------------------------------
+# Using PyTorch's benchmark Compare class, we can have a direct comparison result between two inference models.
+
+results = []
+for i in range(5):
+    test_input = torch.rand(1, 3, 224, 224).half().cuda()
+    sub_label = f"[test {i}]"
+    results.append(
+        benchmark.Timer(
+            stmt="meta_module_resnet18(test_input)",
+            setup="from __main__ import meta_module_resnet18",
+            globals={"test_input": test_input},
+            sub_label=sub_label,
+            description="tuning by meta",
+        ).blocked_autorange()
+    )
+    results.append(
+        benchmark.Timer(
+            stmt="jit_module_resnet18(test_input)",
+            setup="from __main__ import jit_module_resnet18",
+            globals={"test_input": test_input},
+            sub_label=sub_label,
+            description="tuning by jit",
+        ).blocked_autorange()
+    )
+
+# We can print the results on screen.
+compare = benchmark.Compare(results)
+compare.print()
+
+# In the working machine, the average inference time by `optimized_torch` is 860.5 us,
+# while the average inference time of `jit_optimized` is 1156.3 us,
+# showing the performance arises by around 1/4.

Review Comment:
   Apply my comment from `as_torch` tutorial here too. I won't repeat the same comment.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.

Review Comment:
   Drop this sentence. "working machine" is not a word in English. And "the GPU is nvidia/geforce-rtx-3070" doesn't add any new information.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):

Review Comment:
   We shouldn't need `MyResnet18` boilerplate. Subtraction of `mean` from input is irrelevant for this tutorial, and this boilerplate just adds a noise.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:

Review Comment:
   I understand what you mean here, but for new users who are not familiar with TVM or meta schedule, this sentence probably wouldn't make any sense. 



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.

Review Comment:
   The PyTorch model to optimize, along with its example input, are provided by users.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.

Review Comment:
   Drop "see the result". You are not showing "the result", whatever that is.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.

Review Comment:
   Explain what is "the default configuration" and what is the "first example".



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.
+meta_module_resnet18 = MyResNet18(tuning_config, target_cuda)
+
+
+######################################################################
+# Define the resnet18 optimized by TorchScript
+# ------------------------------
+# Besides, let us define a resnet18 model in a standard way.
+# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
+# we will compare the performance of those two optimizers later.

Review Comment:
   Same comment as "Define the resnet18 optimized by MetaSchedule" above. 



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.

Review Comment:
   Please revise this sentence.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.

Review Comment:
   This is related to one of my comments earlier: Users who are new to `optimize_torch` don't know anything about `tuning_config` or `target`. So these two sentences are not useful.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting

Review Comment:
   Please revise these three sentences. Resnet18 is already defined by PT, we are not "defining" it. What are "the two optimizers"? Since this is a tutorial intended for PT users, we don't need to teach them how to define resnet18 in PT. And the link is irrelevant. 



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.

Review Comment:
   I think you copied "For us to follow this tutorial" from other tutorials, but this is not a good English phrase. We can just say "To follow this tutorial".



##########
gallery/how_to/work_with_pytorch/using_as_torch.py:
##########
@@ -0,0 +1,172 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Wrap Your TVMscript with PyTorch Module
+======================
+**Author**: 
+`Yaoda Zhou <https://github.com/juda>`_,
+`Masahiro Masuda <https://github.com/masahi>`_
+
+This article is an introductory tutorial on wrapping the TVMscript code with the PyTorch module.
+By the decorator `as_torch`, users can wrap a TVMscript code into a PyTorch nn.Module naturally.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch, as well as necessary libraries
+import torch
+import torch.nn.functional as F
+import torch.utils.benchmark as benchmark
+
+import tvm
+from tvm.contrib.torch import as_torch
+from tvm.script import tir as T
+
+######################################################################
+# Write your own PyTorch operator by TVMscript
+# -------------------------------
+# PyTorch is a very popular machine learning framework which contains
+# optimized implementations of most commonly used operators.
+# Nevertheless, sometimes you might want to write your own operators in PyTorch.
+# In that case, the performance of such custom operators might not be satisfactory for your needs.
+#
+# One of the examples is to define a 1-d depthwise convolution operator.

Review Comment:
   "For example, suppose we want to define..."



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.
+meta_module_resnet18 = MyResNet18(tuning_config, target_cuda)
+
+
+######################################################################
+# Define the resnet18 optimized by TorchScript
+# ------------------------------
+# Besides, let us define a resnet18 model in a standard way.
+# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
+# we will compare the performance of those two optimizers later.
+
+
+class JitModule(torch.nn.Module):
+    def __init__(self):
+        super(JitModule, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_for_inference` function
+        self.resnet = torch.jit.optimize_for_inference(torch.jit.script(resnet18().cuda().eval()))
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+jit_module_resnet18 = JitModule()
+
+######################################################################
+# Compare the performance between two scheduling approaches.
+# ------------------------------
+# Using PyTorch's benchmark Compare class, we can have a direct comparison result between two inference models.
+
+results = []
+for i in range(5):
+    test_input = torch.rand(1, 3, 224, 224).half().cuda()
+    sub_label = f"[test {i}]"
+    results.append(
+        benchmark.Timer(
+            stmt="meta_module_resnet18(test_input)",
+            setup="from __main__ import meta_module_resnet18",
+            globals={"test_input": test_input},
+            sub_label=sub_label,
+            description="tuning by meta",
+        ).blocked_autorange()
+    )
+    results.append(
+        benchmark.Timer(
+            stmt="jit_module_resnet18(test_input)",
+            setup="from __main__ import jit_module_resnet18",
+            globals={"test_input": test_input},
+            sub_label=sub_label,
+            description="tuning by jit",
+        ).blocked_autorange()
+    )
+
+# We can print the results on screen.

Review Comment:
   Drop this sentence



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.
+meta_module_resnet18 = MyResNet18(tuning_config, target_cuda)
+
+
+######################################################################
+# Define the resnet18 optimized by TorchScript
+# ------------------------------
+# Besides, let us define a resnet18 model in a standard way.
+# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
+# we will compare the performance of those two optimizers later.
+
+
+class JitModule(torch.nn.Module):

Review Comment:
   Drop `JitModule` boilerplate.



##########
gallery/how_to/work_with_pytorch/using_optimized_torch.py:
##########
@@ -0,0 +1,193 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+Compile PyTorch Models
+======================
+**Author**: `Yaoda Zhou <https://github.com/juda/>`_
+This article is an introductory tutorial to optimize PyTorch models by using `tvm.contrib.torch.optimize_torch`.
+For us to follow this tutorial, PyTorch, as well as TorchVision, should be installed.
+"""
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+# Import PyTorch
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+# Import library for profiling
+import torch.utils.benchmark as benchmark
+from torchvision.models import resnet18
+
+# Import `optimize_torch` function
+from tvm.contrib.torch import optimize_torch
+from tvm.meta_schedule import TuneConfig
+
+######################################################################
+# Define a simple module written by PyTorch
+# ------------------------------
+
+
+class SimpleModel(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5)
+        self.conv2 = nn.Conv2d(20, 20, 5)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        return F.relu(self.conv2(x))
+
+
+######################################################################
+# Optimized SimpleModel by TVM MetaSchedule
+# ------------------------------
+# We provide a `optimize_torch` function, which has the similar usage as `torch.jit.trace`.
+# The optimized function/model and example input are required to provide by users.
+# If the third parameter `tuning_config` is not provided, a default configuration is loaded.
+# If the parameter `target` is empty, the model will deploy on the CPU.
+
+
+example_input = torch.randn(20, 1, 10, 10)
+
+# We use the default configuration for the first example.
+model_optimized_by_meta = optimize_torch(SimpleModel(), example_input)
+
+######################################################################
+# Save/Load module
+# ------------------------------
+# We can save and load our tuned module like the standard `nn.module`.
+
+# Let us run our tuned module and see the result.
+ret1 = model_optimized_by_meta(example_input)
+
+torch.save(model_optimized_by_meta, "meta_model.pt")
+model_loaded = torch.load("meta_model.pt")
+
+# We load the module and run it again, and it will return the same result as above.
+ret2 = model_loaded(example_input)
+
+testing.assert_allclose(ret1.numpy(), ret2.numpy(), atol=1e-5, rtol=1e-5)
+
+######################################################################
+# Define the resnet18 optimized by MetaSchedule
+# ------------------------------
+# In another example, we compare the two optimizers about the performance of resnet18
+# For learning how to define a resnet18 model via PyTorch's nn.Module,
+# you can refer to https://pytorch.org/docs/stable/jit.html#mixing-tracing-and-scripting
+
+# We will deploy our model on the GPU.
+# In the working machine, the GPU is nvidia/geforce-rtx-3070.
+target_cuda = "nvidia/geforce-rtx-3070"
+
+# The default setting is adapted automatically by the number of operations of the optimized model
+# When needed, we can define the configuration by ourselves, like:
+tuning_config = TuneConfig(
+    strategy="evolutionary",
+    num_trials_per_iter=64,
+    max_trials_per_task=20000,
+    max_trials_global=20000,
+)
+
+# For PyTorch users, the nn.Module could be written as usual, except for
+# applying "optimize_torch" function on the resnet18 model.
+# In such a way, we obtain a new resnet18 model optimized by MetaSchedule.
+
+
+class MyResNet18(torch.nn.Module):
+    def __init__(self, config, target=None):
+        super(MyResNet18, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_torch` function
+        self.resnet = optimize_torch(resnet18(), [torch.rand(1, 3, 224, 224)], config, target)
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+# Since we set the number of trials largely,
+# we might need to wait more time for the search.
+meta_module_resnet18 = MyResNet18(tuning_config, target_cuda)
+
+
+######################################################################
+# Define the resnet18 optimized by TorchScript
+# ------------------------------
+# Besides, let us define a resnet18 model in a standard way.
+# TorchScript also provides a built-in "optimize_for_inference" function to accelerate the inference,
+# we will compare the performance of those two optimizers later.
+
+
+class JitModule(torch.nn.Module):
+    def __init__(self):
+        super(JitModule, self).__init__()
+        self.means = torch.nn.Parameter(
+            torch.tensor([103.939, 116.779, 123.68]).resize_(1, 3, 1, 1)
+        ).cuda()
+        # Here we impose the `optimize_for_inference` function
+        self.resnet = torch.jit.optimize_for_inference(torch.jit.script(resnet18().cuda().eval()))
+
+    def forward(self, input):
+        return self.resnet(input - self.means)
+
+
+jit_module_resnet18 = JitModule()
+
+######################################################################
+# Compare the performance between two scheduling approaches.

Review Comment:
   What are "two scheduling approaches"? `torch.jit.optimize_for_inference` is not a scheduling approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org